Skip to content

mjr390/Mobile_Game_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The goal of this project was to use pandas to examine sales and demographic data of an online gaming company. The dataset, which is in the Resources folder, contains data about the customers of the company and the items purchased. Included in this repo is this README contining a short analysis of the findings, the code used, and multiple data frames created from the data, and an ipynb file containg the code that can be run.

What the code does:

  • Loads in the csv and converts it to a data frame

  • Finds the total number of unique players, and makes a data frame of this information

  • Finds the total number if unique items purchased, the average price, number of purchases, and total revenue of the dataset. This info is converted into a readable data frame

  • Creates a data frame breaking down the players by gender and another data frame showing sales info by gender

  • Creates a data frame showing the player base by age, and another data frame with purchase statistics by age

  • Creates a data frame showing the top 5 spenders

  • Creates a data frame showing the most popular items and a data frame showing the most profitable items

Note: This project was originally part of a repo with multiple projects, it has been moved here to viewed on its own. Original location can be found here: https://github.com/mjr390/Pandas_In_Jupyter/tree/master/Heroes_of_Pymoli

Heroes Of Pymoli Data Analysis

1. While the vast majority of players are male, their average purchase price per item ($3.02), is lower than average purchase price of all players ($3.05).
2. The most popular item ("Oathbreaker, Last Hope of the Breaking Storm") is also the most profitable. The second ("Nirvana") and third ("Fiery Glass Crusader") most profitable items are also in the top 4 most popular items.
3. Lisosia93 has spent the most money on items ($18.96). Followed by Idastidru52 at ($15.45) and Chamjask73 at($13.83).
import pandas as pd
import numpy as np
file_to_load = "Resources/purchase_data.csv"
df = pd.read_csv(file_to_load)
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase ID SN Age Gender Item ID Item Name Price
0 0 Lisim78 20 Male 108 Extraction, Quickblade Of Trembling Hands 3.53
1 1 Lisovynya38 40 Male 143 Frenzied Scimitar 1.56
2 2 Ithergue48 24 Male 92 Final Critic 4.88
3 3 Chamassasya86 24 Male 100 Blindscythe 3.27
4 4 Iskosia90 23 Male 131 Fury 1.44

Player Count

#Calcualte the total number of players
#Use len() on the unque values in "SN" to find the number of unique players, put that value into a dataframe and display it
totPlayers = len(df["SN"].unique())
totPlayersVar = [{"Total Players":totPlayers}]
totPlayersDF = pd.DataFrame(totPlayersVar)
totPlayersDF
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Total Players
0 576

Purchasing Analysis (Total)

#Find the total number of of Unique Items, Number of Purchases, Total Revenue, and the Average Purchase Price
#use len() on the unique values in "Item Name" and set that value to a variable
uniItems = len(df["Item Name"].unique())

#use .mean() on the "Price" column to find the average cost per item and set to a varibale
#use round() to round the number to 2 decimal places beacuse the value is a float
#and then create a new variable for the value as a string
avgPrice = df["Price"].mean()
avgPrice = round(avgPrice,2)
avgPriceDis = ("$" + str(avgPrice))
#use .sum() on "Price" column to find the sum of all it's values, use round() bacuse it is a float
#and create a new variable with the value as a string to display
totRev = df["Price"].sum()
totRevDis = ("$" + str(totRev))
totNumOfPur = df["Item ID"].count()
#Create a DataFrame to display the results and order them
Purchasing_Analysis = pd.DataFrame([{"Number of Unique Items":uniItems, "Average Price": avgPriceDis, "Number of Purchases": totNumOfPur
                                   , "Total Revenue":totRevDis}])
Purchasing_Analysis_Ordered = Purchasing_Analysis[["Number of Unique Items", "Average Price", "Number of Purchases", "Total Revenue"]]
Purchasing_Analysis_Ordered
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Number of Unique Items Average Price Number of Purchases Total Revenue
0 179 $3.05 780 $2379.77

Gender Demographics

# Find the Percentage and Count of Male Players, Percentage and Count of Female Players, and Percentage and Count of Other / Non-Disclosed
#create variables for the total count of each gender to use to find the % they are of the total player base
malePlayerCount = len(df.loc[df["Gender"] == "Male"])
femalePlayerCount = len(df.loc[df["Gender"] == "Female"])
otherPlayerCount = len(df.loc[df["Gender"] == "Other / Non-Disclosed"])
#Use the variables from the lasst step to create variables for each gender's % of the total
malePlayPer = round(malePlayerCount/totPlayers*100,2)
fePlayPer = round(femalePlayerCount/totPlayers*100,2)
othPlayPer = round(otherPlayerCount/totPlayers*100,2)

#use value_counts() to find how many of each gender there are and set to a variable
GenderCounts = df["Gender"].value_counts()
#create a DataFrame for the found values and labels
GenderCounts_df = pd.DataFrame(GenderCounts)
#Add in a new column with the % values found earlier and create a new DataFrame with the columns in a different order for display
GenderCounts_df["Percentage of Players"]=[malePlayPer, fePlayPer, othPlayPer]
GenderCounts_df_ordered = GenderCounts_df[["Percentage of Players", "Gender"]]

GenderCounts_df_ordered
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Percentage of Players Gender
Male 113.19 652
Female 19.62 113
Other / Non-Disclosed 2.60 15

Purchasing Analysis (Gender)

# Find the purchase count, Average Purchase Price, Total Purchase Value, and Normalized Totals by gender

#Run basic calculations to find the needed values
mPurCount = len(df.loc[df["Gender"] == "Male"])
fPurCount = len(df.loc[df["Gender"] == "Female"])
oPurCount = len(df.loc[df["Gender"] == "Other / Non-Disclosed"])

#For each gender's average price, locate the values for each gender and take the mean
#use round() to reduce each to 2 decimal places. create a new variavle to display the value in a DataFrame formatted as a string
mAvgPrice = df["Price"].loc[df["Gender"] == "Male"].mean()
mAvgPrice = round(mAvgPrice,2)
mAvgPrice_display = ("$" + str(mAvgPrice))

fAvgPrice = df["Price"].loc[df["Gender"] == "Female"].mean()
#The Female average ends in a '0' so a different method of formatting must be used
fAvgPrice = "%.2f" % round(fAvgPrice, 2)
fAvgPrice_diaplay = ("$" + str(fAvgPrice))

oAvgPrice = df["Price"].loc[df["Gender"] == "Other / Non-Disclosed"].mean()
oAvgPrice = round(oAvgPrice,2)
oAvgPrice_display = ("$" + str(oAvgPrice))

#Use sum() instead of mean() to find the total Purchase value for each gender and assign to a variable
#Create a new variable to use in the DataFrame
mTotPrice = df["Price"].loc[df["Gender"] == "Male"].sum()
mTotPrice_display = ("$" + str(mTotPrice))

fTotPrice = df["Price"].loc[df["Gender"] == "Female"].sum()
fTotPrice_display = ("$" + str(fTotPrice))

oTotPrice = df["Price"].loc[df["Gender"] == "Other / Non-Disclosed"].sum()
oTotPrice_display = ("$" + str(oTotPrice))

#Find the Normalized totals for each gender by dividing the total price variable by the purchase count
#Round each to 2 decimal places and then create a new variable to display in the DataFrame
mNorTot = mTotPrice/mPurCount
mNorTot = round(mNorTot,2)
mNorTot_display = ("$" + str(mNorTot))

fNorTot = fTotPrice/fPurCount

fNorTot = "%.2f" % round(fNorTot, 2)
fNorTot_display = ("$" + str(fNorTot))

oNorTot = oTotPrice/oPurCount
oNorTot = round(oNorTot,2)
oNorTot_display = ("$" + str(oNorTot))

#Create the DataFrame for the values
PA_Gender_df = pd.DataFrame({
    "Gender" : ["Male", "Female", "Other / Non-Disclosed"],
    "Purchase Count" : [mPurCount, fPurCount, oPurCount],
    "Average Purchase Price" : [mAvgPrice_display, fAvgPrice_diaplay, oAvgPrice_display],
    "Total Purchase Value" : [mTotPrice_display, fTotPrice_display, oTotPrice_display],
    "Normalized Totals" : [mNorTot_display, fNorTot_display, oNorTot_display]
})
#Reorder the DataFrame, Set the index column to the genders, set the index to appear in alphabetical order and display the DataFrame
PA_Gender_df = PA_Gender_df[["Gender", "Purchase Count", "Average Purchase Price", "Total Purchase Value", "Normalized Totals"]]
PA_Gender_df = PA_Gender_df.set_index("Gender")
PA_Gender_df = PA_Gender_df.sort_index(ascending=True)
PA_Gender_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value Normalized Totals
Gender
Female 113 $3.20 $361.94 $3.20
Male 652 $3.02 $1967.64 $3.02
Other / Non-Disclosed 15 $3.35 $50.19 $3.35

Age Demographics

#create bins and corresponding labels
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]


#create a data frame off of the "Age" column in the data set
new_age_df = pd.DataFrame(df["Age"])

#create a new row with the ages grouped using .cut and the bins and labels
new_age_df["Age Group"] = pd.cut(new_age_df["Age"], age_bins, labels=group_names)

#group the data frame by "Age Group"
new_age_df = new_age_df.groupby("Age Group")

#find how many times an entry appears in a bucket
new_age_df.count()

#turn the found count into a data frame
new_age_df = pd.DataFrame(new_age_df.count())

#rename the columns
new_age_df = new_age_df.rename(columns={"Age":"Total Count"})

#add a new column for the "Percent of PLayers" by dividing the count in each bucket by the total number of players and multiply by 100
#round the result to 2 decimal places
new_age_df["Percent of Players"] = round(new_age_df["Total Count"]/totPlayers*100, 2)

#rearrange the columns
new_age_df = new_age_df[["Percent of Players", "Total Count"]]

#delete the index title
del new_age_df.index.name

#display the results
new_age_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Percent of Players Total Count
<10 3.99 23
10-14 4.86 28
15-19 23.61 136
20-24 63.37 365
25-29 17.53 101
30-34 12.67 73
35-39 7.12 41
40+ 2.26 13

Purchasing Analysis (Age)

#create a new variable holding the "Age" and "Price" columns from the data set
paAge_df = df[["Age", "Price"]]
paAge_df.head()


#create bins and corresponding labels
age_bins = [0, 9.90, 14.90, 19.90, 24.90, 29.90, 34.90, 39.90, 99999]
group_names = ["<10", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40+"]

#add age groups to the data frame useing .cut on the "Age" column and the age_groups and group_names
paAge_df["Age Groups"] = pd.cut(paAge_df["Age"], age_bins, labels=group_names)

#groupby the new column and save to a variable to use later to find the total count of each bin
grouped_paAge = paAge_df.groupby(["Age Groups"])

#use .count() on the created variable to find how many values are in each bin
#save to a variable as a data frame to create a row to be used later
findPurCount = pd.DataFrame(grouped_paAge.count())

#group the data frame by the new column
paAge_df = paAge_df.groupby("Age Groups")

#take the sum of each bin and save to a data frame
paAge_df.sum()
paAge_df2 = pd.DataFrame(paAge_df.sum())

#add a column to the data frame which is the column from the data frame that .count() was used on
paAge_df2["Purchase Count"] = findPurCount["Age"]

#remove the unneeded columns
paAge_df2 = paAge_df2[["Price", "Purchase Count"]]

#rename columns
paAge_df2 = paAge_df2.rename(columns={"Price":"Total Purchase Value"})

#add columns to the data frame for "Average Purchase Price" and "Normalized Totals" doing the calcs for each
paAge_df2["Average Purchase Price"] = round(paAge_df2["Total Purchase Value"]/paAge_df2["Purchase Count"],2)
paAge_df2["Normalized Totals"] = round(paAge_df2["Total Purchase Value"]/paAge_df2["Purchase Count"],2)

#reorganize the columns to make more sense
paAge_df2 = paAge_df2[["Purchase Count", "Average Purchase Price", "Total Purchase Value", "Normalized Totals"]]

#check the data types of the columns for formatting
paAge_df2.dtypes

#Changethe format on the columns that are floats to represent money
pd.options.display.float_format = '${:,.2f}'.format

#Delete the index name
del paAge_df2.index.name

#display the data frame
paAge_df2
C:\Users\Mike\Anaconda3\envs\PythonData\lib\site-packages\ipykernel_launcher.py:11: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value Normalized Totals
<10 23 $3.35 $77.13 $3.35
10-14 28 $2.96 $82.78 $2.96
15-19 136 $3.04 $412.89 $3.04
20-24 365 $3.05 $1,114.06 $3.05
25-29 101 $2.90 $293.00 $2.90
30-34 73 $2.93 $214.00 $2.93
35-39 41 $3.60 $147.67 $3.60
40+ 13 $2.94 $38.24 $2.94

Top Spenders

#use vlaue counts on the "SN" column to find how many items each player bought and save to a variable
purCount = df["SN"].value_counts()
purCount.head()

#use groupby to seperate the data into fields by SN
findTotPurValue = df.groupby(["SN"])
findTotPurValue.count().head()

#find the sum of price for each SN and save to a variable
totSpendPerValue = findTotPurValue["Price"].sum()

totSpendPerValue.head(10)

#create a new DataFrame with the new variables
Top_Spenders = pd.DataFrame({
    "Purchase Count":purCount,
    "Total Purchase Value": totSpendPerValue
})
Top_Spenders.head(10)

#Add ac column for average price by dividing the values in "Total Purchase Value" by those in "Purchase Count"
#round the findings to 2 decimal places
Top_Spenders["Average Purchase Price"] = round((Top_Spenders["Total Purchase Value"]/Top_Spenders["Purchase Count"]),2)
Top_Spenders.head(10)

#format the columns with floats data into currency format
pd.options.display.float_format = '${:,.2f}'.format
Top_Spenders.head(10)

#Name the index in the DataFrame
Top_Spenders.index.name = "SN"
Top_Spenders.head(10)

#Sort the DataFrame by highest "Total Purchase Value" on top
Top_Spenders_Sorted = Top_Spenders.sort_values("Total Purchase Value", ascending=False)
Top_Spenders_Sorted.head()

#Rearrange the columns
Top_Spenders_Sorted = Top_Spenders_Sorted[["Purchase Count", "Average Purchase Price", "Total Purchase Value"]]

#display the finished data frame
Top_Spenders_Sorted.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value
SN
Lisosia93 5 $3.79 $18.96
Idastidru52 4 $3.86 $15.45
Chamjask73 3 $4.61 $13.83
Iral74 4 $3.40 $13.62
Iskadarya95 3 $4.37 $13.10

Most Popular Items

#create a new data frame by taking the "Item Id", "Item Name", and ""Price columns from the main data frame
popItems_df = df[["Item ID", "Item Name", "Price"]]
popItems_df.head()

#find how many of each Item ID exists and set to a variable
itemPurCount = df["Item ID"].value_counts()
itemPurCount.head()

#use the last variable to create a new dataframe
itemNumPurchases = pd.DataFrame(itemPurCount)
#Change the column names
itemNumPurchases = itemNumPurchases.rename(columns={"Item ID": "Purchase Count"})
#name the index
itemNumPurchases.index.name = "Item ID"
itemNumPurchases.head()

#group the data frame popItems by "Item ID" and "Index"
grouped_Items = popItems_df.groupby(["Item ID", "Item Name"])
#create a data frame for grouped_Items with the sum of it's "Price" column
grouped_Items_df = pd.DataFrame(grouped_Items["Price"].sum())
grouped_Items_df.head()

#merge the grouped_Items_df with the itemNumPurchases by their indexes
mergedPopItems = pd.merge(grouped_Items_df, itemNumPurchases, left_index=True, right_index=True)
mergedPopItems.head()

#create a new variable holding the "Price" and correct purchase count columns from the merged data frame
correctMergedPopItems = mergedPopItems[["Price", "Purchase Count"]]
correctMergedPopItems.head()

#create a new varable with the value of the last data frame, but change the column titles
correctMergedPopItTitles = correctMergedPopItems.rename(columns = {"Price":"Total Purchase Value", "Purchase Count_y":"Purchase Count"})
correctMergedPopItTitles

#add a new column called "Item Price" to shw the price of each colum by dividing the item's total count
correctMergedPopItTitles["Item Price"] = (correctMergedPopItTitles["Total Purchase Value"]/correctMergedPopItTitles["Purchase Count"])
correctMergedPopItTitles.head()

#sort by the "Purchase Count" column and display in decending order, set to a new variable
mergedPopIt_sorted = correctMergedPopItTitles.sort_values("Purchase Count", ascending=False)
mergedPopIt_sorted.head()

#reorder the columns
mergedPopIt_sorted = mergedPopIt_sorted[["Purchase Count", "Item Price", "Total Purchase Value"]]
mergedPopIt_sorted.head()

#Change the columns displaying money to a currency format by changing the format of all columns that are floats
pd.options.display.float_format = '${:,.2f}'.format
mergedPopIt_sorted.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Item Price Total Purchase Value
Item ID Item Name
178 Oathbreaker, Last Hope of the Breaking Storm 12 $4.23 $50.76
145 Fiery Glass Crusader 9 $4.58 $41.22
108 Extraction, Quickblade Of Trembling Hands 9 $3.53 $31.77
82 Nirvana 9 $4.90 $44.10
19 Pursuit, Cudgel of Necromancy 8 $1.02 $8.16

Most Profitable Items

#sort the most popular items data frame by the "Total Purchase Value" and show in decending order to find the most profitable item
#set to a new variable and display the variable
highProfItems = mergedPopIt_sorted.sort_values("Total Purchase Value", ascending=False)
highProfItems.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Purchase Count Item Price Total Purchase Value
Item ID Item Name
178 Oathbreaker, Last Hope of the Breaking Storm 12 $4.23 $50.76
82 Nirvana 9 $4.90 $44.10
145 Fiery Glass Crusader 9 $4.58 $41.22
92 Final Critic 8 $4.88 $39.04
103 Singed Scalpel 8 $4.35 $34.80

About

Jupyter Notebook file that imports a csv to analyze the data of a gaming company

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors