| title_meta | Chapter 4 | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| title | NumPy | |||||||||||||||
| description | NumPy is a fundamental Python package to efficiently practice data science. Learn to work with powerful tools in the NumPy array, and get started with data exploration. | |||||||||||||||
| attachments |
|
|||||||||||||||
| lessons |
|
type: VideoExercise
key: f4545baa53
xp: 50@projector_key
a0487c26210f6b71ea98f917734cea3a
type: NormalExercise
key: 84cab9d170
lang: python
xp: 100
skills:
- 2You're now going to dive into the world of baseball. Along the way, you'll get comfortable with the basics of numpy, a powerful package to do data science.
A list baseball has already been defined in the Python script, representing the height of some baseball players in centimeters. Can you add some code to create a numpy array from it?
@instructions
- Import the
numpypackage asnp, so that you can refer tonumpywithnp. - Use
np.array()to create anumpyarray frombaseball. Name this arraynp_baseball. - Print out the type of
np_baseballto check that you got it right.
@hint
import numpy as npwill do the trick. Now, you have to usenp.fun_name()whenever you want to use anumpyfunction.np.array()should take on inputbaseball. Assign the result of the function call tonp_baseball.- To print out the type of a variable
x, simply typeprint(type(x)).
@pre_exercise_code
import numpy as np
@sample_code
# Import the numpy package as np
baseball = [180, 215, 210, 210, 188, 176, 209, 200]
# Create a numpy array from baseball: np_baseball
# Print out type of np_baseball
@solution
# Import the numpy package as np
import numpy as np
baseball = [180, 215, 210, 210, 188, 176, 209, 200]
# Create a NumPy array from baseball: np_baseball
np_baseball = np.array(baseball)
# Print out type of np_baseball
print(type(np_baseball))
@sct
predef_msg = "You don't have to change or remove the predefined variables."
Ex().has_import("numpy")
Ex().check_correct(
check_object("np_baseball"),
multi(
check_object("baseball", missing_msg=predef_msg).has_equal_value(incorrect_msg=predef_msg),
check_function("numpy.array").check_args(0).has_equal_ast()
)
)
Ex().has_printout(0)
success_msg("Great job!")
type: NormalExercise
key: e7e25a89ea
lang: python
xp: 100
skills:
- 2You are a huge baseball fan. You decide to call the MLB (Major League Baseball) and ask around for some more statistics on the height of the main players. They pass along data on more than a thousand players, which is stored as a regular Python list: height_in. The height is expressed in inches. Can you make a numpy array out of it and convert the units to meters?
height_in is already available and the numpy package is loaded, so you can start straight away (Source: stat.ucla.edu).
@instructions
- Create a
numpyarray fromheight_in. Name this new arraynp_height_in. - Print
np_height_in. - Multiply
np_height_inwith0.0254to convert all height measurements from inches to meters. Store the new values in a new array,np_height_m. - Print out
np_height_mand check if the output makes sense.
@hint
- Use
np.array()and pass itheight. Store the result innp_height_in. - To print out a variable
x, typeprint(x)in the Python script. - Perform calculations as if
np_height_inis a single number:np_height_in * conversion_factoris part of the answer. - To print out a variable
x, typeprint(x)in the Python script.
@pre_exercise_code
import pandas as pd
mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")
height_in = mlb['Height'].tolist()
import numpy as np
@sample_code
# Import numpy
import numpy as np
# Create a numpy array from height_in: np_height_in
# Print out np_height_in
# Convert np_height_in to m: np_height_m
# Print np_height_m
@solution
# Import numpy
import numpy as np
# Create a numpy array from height_in: np_height_in
np_height_in = np.array(height_in)
# Print out np_height_in
print(np_height_in)
# Convert np_height_in to m: np_height_m
np_height_m = np_height_in * 0.0254
# Print np_height_m
print(np_height_m)
@sct
Ex().has_import("numpy", same_as = False)
Ex().check_correct(
has_printout(0),
check_correct(
check_object('np_height_in').has_equal_value(),
check_function('numpy.array').check_args(0).has_equal_ast()
)
)
Ex().check_correct(
has_printout(1),
check_object("np_height_m").has_equal_value(incorrect_msg = "Use `np_height_in * 0.0254` to calculate `np_height_m`.")
)
success_msg("Nice! In the blink of an eye, `numpy` performs multiplications on more than 1000 height measurements.")
type: MultipleChoiceExercise
key: 3662ff6637
lang: python
xp: 50
skills:
- 2numpy is great for doing vector arithmetic. If you compare its functionality with regular Python lists, however, some things are different.
First, numpy arrays cannot contain elements with different types. If you mix types, like booleans and integers, numpy automatically converts them to a common type. Booleans like True and False are treated as 1 and 0 when combined with numbers, so the array ends up as integers.
Second, the typical arithmetic operators, such as +, -, * and / have a different meaning for regular Python lists and numpy arrays.
Select the code that results in the following output:
np.array([True, 1, 2]) + np.array([3, 4, False])
The numpy package is already imported as np. You can run each option in the IPython Shell to see the output.
@possible_answers
np.array([True, 1, 2, 3, 4, False])np.array([4, 3, 0]) + np.array([0, 2, 2])np.array([1, 1, 2]) + np.array([3, 4, -1])np.array([0, 1, 2, 3, 4, 5])
@hint
- Copy the different code chunks and paste them in the IPython Shell. Hit enter to run the code and see which output matches the one generated by
np.array([True, 1, 2]) + np.array([3, 4, False]).
@pre_exercise_code
import numpy as np
@sct
msg1 = msg3 = msg4 = "Incorrect. Try out the different code chunks and see which one matches the target code chunk."
msg2 = "Great job! `True` is converted to 1, `False` is converted to 0."
Ex().has_chosen(2, [msg1, msg2, msg3, msg4])
type: NormalExercise
key: fcb2a9007b
lang: python
xp: 100
skills:
- 2Subsetting (using the square bracket notation on lists or arrays) works exactly the same with both lists and arrays.
This exercise already has two lists, height_in and weight_lb, loaded in the background for you. These contain the height and weight of the MLB players as regular lists. It also has two numpy array lists, np_weight_lb and np_height_in prepared for you.
@instructions
- Subset
np_weight_lbby printing out the element at index 50. - Print out a sub-array of
np_height_inthat contains the elements at index 100 up to and including index 110.
@hint
- Make sure to wrap a
print()call around your subsetting operations. - Use
[100:111]to get the elements from index 100 up to and including index 110.
@pre_exercise_code
import pandas as pd
mlb = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")
height_in = mlb['Height'].tolist()
weight_lb = mlb['Weight'].tolist()
@sample_code
import numpy as np
np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)
# Print out the weight at index 50
# Print out sub-array of np_height_in: index 100 up to and including index 110
@solution
import numpy as np
np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)
# Print out the weight at index 50
print(np_weight_lb[50])
# Print out sub-array of np_height_in: index 100 up to and including index 110
print(np_height_in[100:111])
@sct
Ex().has_import("numpy", same_as=False)
msg = "You don't have to change or remove the predefined variables."
Ex().multi(
check_object("np_height_in", missing_msg=msg).has_equal_value(incorrect_msg = msg),
check_object("np_weight_lb", missing_msg=msg).has_equal_value(incorrect_msg = msg)
)
Ex().has_printout(0)
Ex().has_printout(1)
success_msg("Nice! Time to learn something new: 2D NumPy arrays!")
type: VideoExercise
key: 1241efac7a
xp: 50@projector_key
ae3238dcc7feb9adecfee0c395fc8dc8
type: NormalExercise
key: 5cb045bb13
lang: python
xp: 100
skills:
- 2Before working on the actual MLB data, let's try to create a 2D numpy array from a small list of lists.
In this exercise, baseball is a list of lists. The main list contains 4 elements. Each of these elements is a list containing the height and the weight of 4 baseball players, in this order. baseball is already coded for you in the script.
@instructions
- Use
np.array()to create a 2Dnumpyarray frombaseball. Name itnp_baseball. - Print out the type of
np_baseball. - Print out the
shapeattribute ofnp_baseball. Usenp_baseball.shape.
@hint
baseballis already coded for you in the script. Callnp.array()on it and store the resulting 2Dnumpyarray innp_baseball.- Use
print()in combination withtype()for the second instruction. np_baseball.shapewill give you the dimensions of thenp_baseball. Make sure to wrap aprint()call around it.
@pre_exercise_code
@sample_code
import numpy as np
baseball = [[180, 78.4],
[215, 102.7],
[210, 98.5],
[188, 75.2]]
# Create a 2D numpy array from baseball: np_baseball
# Print out the type of np_baseball
# Print out the shape of np_baseball
@solution
import numpy as np
baseball = [[180, 78.4],
[215, 102.7],
[210, 98.5],
[188, 75.2]]
# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)
# Print out the type of np_baseball
print(type(np_baseball))
# Print out the shape of np_baseball
print(np_baseball.shape)
@sct
msg = "You don't have to change or remove the predefined variables."
Ex().check_object("baseball", missing_msg=msg).has_equal_value(incorrect_msg = msg)
Ex().has_import("numpy", same_as = False)
Ex().check_correct(
multi(
has_printout(0),
has_printout(1)
),
check_correct(
check_object('np_baseball').has_equal_value(),
check_function('numpy.array').check_args(0).has_equal_ast()
)
)
success_msg("Great! You're ready to convert the actual MLB data to a 2D `numpy` array now!")
type: NormalExercise
key: 5df25d0b7b
lang: python
xp: 100
skills:
- 2You realize that it makes more sense to restructure all this information in a 2D numpy array.
You have a Python list of lists. In this list of lists, each sublist represents the height and weight of a single baseball player. The name of this list is baseball and it has been loaded for you already (although you can't see it).
Store the data as a 2D array to unlock numpy's extra functionality.
@instructions
- Use
np.array()to create a 2Dnumpyarray frombaseball. Name itnp_baseball. - Print out the
shapeattribute ofnp_baseball.
@hint
baseballis already available in the Python environment. Callnp.array()on it and store the resulting 2Dnumpyarray innp_baseball.np_baseball.shapewill give the dimensions of thenp_baseball. Make sure to wrap aprint()call around it.
@pre_exercise_code
import pandas as pd
baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight']].to_numpy().tolist()
import numpy as np
@sample_code
import numpy as np
# Create a 2D numpy array from baseball: np_baseball
np_baseball =
# Print out the shape of np_baseball
@solution
import numpy as np
# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)
# Print out the shape of np_baseball
print(np_baseball.shape)
@sct
Ex().has_import("numpy", same_as = False)
Ex().check_correct(
has_printout(0),
check_correct(
check_object('np_baseball').has_equal_value(),
check_function('numpy.array').check_args(0).has_equal_ast()
)
)
success_msg("Slick! Time to show off some killer features of multi-dimensional `numpy` arrays!")
type: NormalExercise
key: aeca4977f0
lang: python
xp: 100
skills:
- 2If your 2D numpy array has a regular structure, i.e. each row and column has a fixed number of values, complicated ways of subsetting become very easy. Have a look at the code below where the elements "a" and "c" are extracted from a list of lists.
# numpy
import numpy as np
np_x = np.array(x)
np_x[:, 0]
The indexes before the comma refer to the rows, while those after the comma refer to the columns. The : is for slicing; in this example, it tells Python to include all rows.
@instructions
- Print out the 50th row of
np_baseball. - Make a new variable,
np_weight_lb, containing the entire second column ofnp_baseball. - Select the height (first column) of the 124th baseball player in
np_baseballand print it out.
@hint
- You need row index 49 in the first instruction! More specifically, you'll want to use
[49, :]. - To select the entire second column, you'll need
[:, 1]. - For the last instruction, use
[123, 0]; don't forget to wrap it all in aprint()statement.
@pre_exercise_code
import pandas as pd
baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight']].to_numpy().tolist()
import numpy as np
@sample_code
import numpy as np
np_baseball = np.array(baseball)
# Print out the 50th row of np_baseball
# Select the entire second column of np_baseball: np_weight_lb
# Print out height of 124th player
@solution
import numpy as np
np_baseball = np.array(baseball)
# Print out the 50th row of np_baseball
print(np_baseball[49,:])
# Select the entire second column of np_baseball: np_weight_lb
np_weight_lb = np_baseball[:,1]
# Print out height of 124th player
print(np_baseball[123, 0])
@sct
msg = "You don't have to change or remove the predefined variables."
Ex().multi(
has_import("numpy", same_as = False),
check_object("np_baseball", missing_msg=msg).has_equal_value(incorrect_msg = msg)
)
Ex().has_printout(0)
Ex().check_object('np_weight_lb').has_equal_value(incorrect_msg = "You can use `np_baseball[:,1]` to define `np_weight_lb`. This will select the entire first column.")
Ex().has_printout(1)
success_msg("This is going well!")
type: NormalExercise
key: 1c2378b677
lang: python
xp: 100
skills:
- 22D numpy arrays can perform calculations element by element, like numpy arrays.
np_baseball is coded for you; it's again a 2D numpy array with 3 columns representing height (in inches), weight (in pounds) and age (in years). baseball is available as a regular list of lists and updated is available as 2D numpy array.
@instructions
- You managed to get hold of the changes in height, weight and age of all baseball players. It is available as a 2D
numpyarray,updated. Addnp_baseballandupdatedand print out the result. - You want to convert the units of height and weight to metric (meters and kilograms, respectively). As a first step, create a
numpyarray with three values:0.0254,0.453592and1. Name this arrayconversion. - Multiply
np_baseballwithconversionand print out the result.
@hint
np_baseball + updatedwill do an element-wise summation of the twonumpyarrays.- Create a
numpyarray withnp.array(); the input is a regular Python list with three elements. np_baseball * conversionwill work, without extra work. Try out it! Make sure to wrap it in aprint()call.
@pre_exercise_code
import pandas as pd
import numpy as np
baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight', 'Age']].to_numpy().tolist()
n = len(baseball)
updated = np.array(pd.read_csv("https://assets.datacamp.com/course/intro_to_python/update.csv", header = None))
import numpy as np
@sample_code
import numpy as np
np_baseball = np.array(baseball)
# Print out addition of np_baseball and updated
# Create numpy array: conversion
# Print out product of np_baseball and conversion
@solution
import numpy as np
np_baseball = np.array(baseball)
# Print out addition of np_baseball and updated
print(np_baseball + updated)
# Create numpy array: conversion
conversion = np.array([0.0254, 0.453592, 1])
# Print out product of np_baseball and conversion
print(np_baseball * conversion)
@sct
Ex().has_import("numpy")
msg = "You don't have to change or remove the predefined variables."
Ex().check_object("np_baseball", missing_msg=msg).has_equal_value(incorrect_msg = msg)
Ex().has_printout(0)
Ex().check_correct(
has_printout(1),
check_correct(
check_object('conversion').has_equal_value(),
check_function('numpy.array', index = 1).check_args(0).has_equal_value()
)
)
success_msg("Great job! Notice how with very little code, you can change all values in your `numpy` data structure in a very specific way. This will be very useful in your future as a data scientist!")
type: VideoExercise
key: 287995e488
xp: 50@projector_key
34495ba457d74296794d2a122c9b6e19
type: NormalExercise
key: 509c588eb6
lang: python
xp: 100
skills:
- 2You now know how to use numpy functions to get a better feeling for your data.
The baseball data is available as a 2D numpy array with 3 columns (height, weight, age) and 1015 rows. The name of this numpy array is np_baseball. After restructuring the data, however, you notice that some height values are abnormally high. Follow the instructions and discover which summary statistic is best suited if you're dealing with so-called outliers. np_baseball is available.
@instructions
- Create
numpyarraynp_height_inthat is equal to first column ofnp_baseball. - Print out the mean of
np_height_in. - Print out the median of
np_height_in.
@hint
- Use 2D
numpysubsetting:[:,0]is a part of the solution. - If
numpyis imported asnp, you can usenp.mean()to get the mean of a NumPy array. Don't forget to throw in aprint()call. - For the last instruction, use
np.median().
@pre_exercise_code
import pandas as pd
np_baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight', 'Age']].to_numpy()
np_baseball[slice(0, 1015, 50), 0] = np_baseball[slice(0, 1015, 50), 0]*1000
import numpy as np
@sample_code
import numpy as np
# Create np_height_in from np_baseball
# Print out the mean of np_height_in
# Print out the median of np_height_in
@solution
import numpy as np
# Create np_height_in from np_baseball
np_height_in = np_baseball[:,0]
# Print out the mean of np_height_in
print(np.mean(np_height_in))
# Print out the median of np_height_in
print(np.median(np_height_in))
@sct
Ex().has_import("numpy", same_as = False)
Ex().check_object("np_height_in").has_equal_value(incorrect_msg = "You can use `np_baseball[:,0]` to select the first column from `np_baseball`"),
Ex().check_correct(
has_printout(0),
check_function('numpy.mean').has_equal_value()
)
Ex().check_correct(
has_printout(1),
check_function('numpy.median').has_equal_value()
)
success_msg("An average height of 1586 inches, that doesn't sound right, does it? However, the median does not seem affected by the outliers: 74 inches makes perfect sense. It's always a good idea to check both the median and the mean, to get an idea about the overall distribution of the entire dataset.")
type: NormalExercise
key: '4409948807'
lang: python
xp: 100
skills:
- 2Because the mean and median are so far apart, you decide to complain to the MLB. They find the error and send the corrected data over to you. It's again available as a 2D NumPy array np_baseball, with three columns.
The Python script in the editor already includes code to print out informative messages with the different summary statistics and numpy is already loaded as np. Can you finish the job? np_baseball is available.
@instructions
- The code to print out the mean height is already included. Complete the code for the median height.
- Use
np.std()on the first column ofnp_baseballto calculatestddev. - Do big players tend to be heavier? Use
np.corrcoef()to store the correlation between the first and second column ofnp_baseballincorr.
@hint
- Use
np.median()to calculate the median. Make sure to select to correct column first! - Subset the same column when calculating the standard deviation with
np.std(). - Use
np_baseball[:, 0]andnp_baseball[:, 1]to select the first and second columns; these are the inputs tonp.corrcoef().
@pre_exercise_code
import pandas as pd
np_baseball = pd.read_csv("https://assets.datacamp.com/course/intro_to_python/baseball.csv")[['Height', 'Weight', 'Age']].to_numpy()
import numpy as np
@sample_code
avg = np.mean(np_baseball[:,0])
print("Average: " + str(avg))
# Print median height
med = ____
print("Median: " + str(med))
# Print out the standard deviation on height
stddev = ____
print("Standard Deviation: " + str(stddev))
# Print out correlation between first and second column
corr = ____
print("Correlation: " + str(corr))
@solution
avg = np.mean(np_baseball[:,0])
print("Average: " + str(avg))
# Print median height
med = np.median(np_baseball[:,0])
print("Median: " + str(med))
# Print out the standard deviation on height
stddev = np.std(np_baseball[:,0])
print("Standard Deviation: " + str(stddev))
# Print out correlation between first and second column
corr = np.corrcoef(np_baseball[:,0], np_baseball[:,1])
print("Correlation: " + str(corr))
@sct
msg = "You shouldn't change or remove the predefined `avg` variable."
Ex().check_object("avg", missing_msg=msg).has_equal_value(incorrect_msg=msg)
missing = "Have you used `np.median()` to calculate the median?"
incorrect = "To calculate `med`, pass the first column of `np_baseball` to `numpy.median()`. The example of `np.mean()` shows how it's done."
Ex().check_correct(
check_object("med").has_equal_value(),
check_function("numpy.median", index=0, missing_msg=missing).check_args(0).has_equal_value(incorrect_msg=incorrect)
)
missing = "Have you used `np.std()` to calculate the standard deviation?"
incorrect = "To calculate `stddev`, pass the first column of `np_baseball` to `numpy.std()`. The example of `np.mean()` shows how it's done."
Ex().check_correct(
check_object("stddev").has_equal_value(),
check_function("numpy.std", index=0, missing_msg=missing).check_args(0).has_equal_value(incorrect_msg=incorrect)
)
missing = "Have you used `np.corrcoef()` to calculate the correlation?"
incorrect1 = "To calculate `corr`, the first argument to `np.corrcoef()` should be the first column of `np_baseball`, similar to how did it before."
incorrect2 = "To calculate `corr`, the second argument to `np.corrcoef()` should be the second column of `np_baseball`. Instead of `[:,0]`, use `[:,1]` this time."
Ex().check_correct(
check_object("corr").has_equal_value(),
check_function("numpy.corrcoef", index=0, missing_msg=missing).multi(
check_args(0, missing_msg=incorrect1).has_equal_value(incorrect_msg=incorrect1),
check_args(1, missing_msg=incorrect2).has_equal_value(incorrect_msg=incorrect2)
)
)
success_msg("Great work! You've built a solid foundation - now it's time to use all of your new data science skills to solve more challenges and make an impact.")