Skip to content

Commit 3b379e7

Browse files
Merge pull request #163 from LaunchCodeEducation/audit-cleaning-pandas
Audit for cleaning data with pandas complete, moving to exercise and …
2 parents f3ff310 + 779f5c0 commit 3b379e7

4 files changed

Lines changed: 5 additions & 5 deletions

File tree

content/cleaning-pandas/reading/inconsistent-data/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ We are going to start by converting everything to numbers. Once everything in th
2525
1. First, let's focus in on turning `'True'` to `'1'`.
2626

2727
```python
28-
etsy_sellers = etsy_sellers.loc[etsy_sellers['Star_Seller'] == 'True'] = '1'
28+
etsy_sellers.loc[etsy_sellers['Star_Seller'] == 'True', 'Star_Seller'] = '1'
2929
```
3030

3131
This code will replace all the values in the `Star_Seller` column with `'1'` only if that value is currently equal to `'True'`.

content/cleaning-pandas/reading/irregular-data/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,11 @@ However, when we use the `descibe()` function and look more closely at `Total_Ra
2828

2929
```python
3030

31-
outlier = np.where((etsy_sellers['Total_Rating'] < 0.0) & (etsy_sellers['Total_Rating'] > 5.0))
31+
outlier = np.where((etsy_sellers['Total_Rating'] < 0.0) | (etsy_sellers['Total_Rating'] > 5.0))
3232
etsy_sellers.drop(etsy_sellers[outlier])
3333
```
3434

35-
Even though we can visually see where Sierra's Stationary is in the dataframe, if we have one row that is off, we might have others. `np.where()` returns a list of all indices where the condition is met. In this case, the condition is that the rating must be greater than or equal to 0 and less than or equal to 5.
35+
Even though we can visually see where Sierra's Stationary is in the dataframe, if we have one row that is off, we might have others. `np.where()` returns a list of all indices where the condition is met. In this case, the condition finds ratings that are outside the valid range (less than 0 or greater than 5).
3636

3737
We can also use visualizations to detect outliers.
3838

content/cleaning-pandas/reading/missing-data/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ draft = false
44
weight = 2
55
+++
66

7-
Missing data is when a value for either a row or column is not actually there. pandas has different data types for missing data so when you print out a row of a dataframe where data is missing you will see one of these data types. pandas has a number of built-in methods that can handle missing data. `None` and `NaN` both hold missing values, however, the two are not actually equivalent. The boolean expression `None == nan` evaluates to `False`. This is because `None` is a Python object and `NaN` is a floating point value. If you find yourself needing to code a custom solution to handle an issue related to missing data, you might need to keep this in mind!
7+
Missing data is when a value for either a row or column is not actually there. pandas has different data types for missing data so when you print out a row of a dataframe where data is missing you will see one of these data types. pandas has a number of built-in methods that can handle missing data. `None` and `NaN` both hold missing values, however, the two are not actually equivalent. The boolean expression `None == np.nan` evaluates to `False`. Additionally, even `np.nan == np.nan` evaluates to `False` - this is a special property of NaN values! This is because `None` is a Python object and `NaN` is a floating point value. If you find yourself needing to code a custom solution to handle an issue related to missing data, you might need to keep this in mind!
88

99
{{% notice blue Note %}}
1010

content/cleaning-pandas/studio/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ weight = 3
77

88
## Getting Started
99

10-
For this weeks studio you will be working with a partner.
10+
For this week's studio you will be working with a partner.
1111

1212
Each of you should work in your notebook that can be found at `data-analysis-projects/cleaning-data-with-pandas/studio`.
1313
You will also notice that we have already added a CSV for you. This CSV is a subset of a larger dataset on [Kaggle](https://www.kaggle.com/datasets/usda/a-year-of-pumpkin-prices) from the USDA. You only need this one CSV to do the studio.

0 commit comments

Comments
 (0)