Merge pull request #163 from LaunchCodeEducation/audit-cleaning-pandas

jwoolbright23 · web-flow · commit 3b379e74aa65 · 2025-11-17T17:35:26.000-06:00
Audit for cleaning data with pandas complete, moving to exercise and …
diff --git a/content/cleaning-pandas/reading/inconsistent-data/_index.md b/content/cleaning-pandas/reading/inconsistent-data/_index.md
@@ -25,7 +25,7 @@ We are going to start by converting everything to numbers. Once everything in th
 1. First, let's focus in on turning `'True'` to `'1'`.
 
    ```python
-   etsy_sellers = etsy_sellers.loc[etsy_sellers['Star_Seller'] == 'True'] = '1'
+   etsy_sellers.loc[etsy_sellers['Star_Seller'] == 'True', 'Star_Seller'] = '1'
    ```
 
    This code will replace all the values in the `Star_Seller` column with `'1'` only if that value is currently equal to `'True'`.
diff --git a/content/cleaning-pandas/reading/irregular-data/_index.md b/content/cleaning-pandas/reading/irregular-data/_index.md
@@ -28,11 +28,11 @@ However, when we use the `descibe()` function and look more closely at `Total_Ra
 
 ```python
 
-outlier = np.where((etsy_sellers['Total_Rating'] < 0.0) & (etsy_sellers['Total_Rating'] > 5.0))
+outlier = np.where((etsy_sellers['Total_Rating'] < 0.0) | (etsy_sellers['Total_Rating'] > 5.0))
 etsy_sellers.drop(etsy_sellers[outlier])
 ```
 
-Even though we can visually see where Sierra's Stationary is in the dataframe, if we have one row that is off, we might have others. `np.where()` returns a list of all indices where the condition is met. In this case, the condition is that the rating must be greater than or equal to 0 and less than or equal to 5.
+Even though we can visually see where Sierra's Stationary is in the dataframe, if we have one row that is off, we might have others. `np.where()` returns a list of all indices where the condition is met. In this case, the condition finds ratings that are outside the valid range (less than 0 or greater than 5).
 
 We can also use visualizations to detect outliers. 
 
diff --git a/content/cleaning-pandas/reading/missing-data/_index.md b/content/cleaning-pandas/reading/missing-data/_index.md
@@ -4,7 +4,7 @@ draft = false
 weight = 2
 +++
 
-Missing data is when a value for either a row or column is not actually there. pandas has different data types for missing data so when you print out a row of a dataframe where data is missing you will see one of these data types. pandas has a number of built-in methods that can handle missing data. `None` and `NaN` both hold missing values, however, the two are not actually equivalent. The boolean expression `None == nan` evaluates to `False`. This is because `None` is a Python object and `NaN` is a floating point value. If you find yourself needing to code a custom solution to handle an issue related to missing data, you might need to keep this in mind!
+Missing data is when a value for either a row or column is not actually there. pandas has different data types for missing data so when you print out a row of a dataframe where data is missing you will see one of these data types. pandas has a number of built-in methods that can handle missing data. `None` and `NaN` both hold missing values, however, the two are not actually equivalent. The boolean expression `None == np.nan` evaluates to `False`. Additionally, even `np.nan == np.nan` evaluates to `False` - this is a special property of NaN values! This is because `None` is a Python object and `NaN` is a floating point value. If you find yourself needing to code a custom solution to handle an issue related to missing data, you might need to keep this in mind!
 
 {{% notice blue Note %}}
 
diff --git a/content/cleaning-pandas/studio/_index.md b/content/cleaning-pandas/studio/_index.md
@@ -7,7 +7,7 @@ weight = 3
 
 ## Getting Started
 
-For this weeks studio you will be working with a partner.  
+For this week's studio you will be working with a partner.  
 
 Each of you should work in your notebook that can be found at `data-analysis-projects/cleaning-data-with-pandas/studio`.
 You will also notice that we have already added a CSV for you. This CSV is a subset of a larger dataset on [Kaggle](https://www.kaggle.com/datasets/usda/a-year-of-pumpkin-prices) from the USDA. You only need this one CSV to do the studio.