Skip to content

H2-data/Edible_Earth_Museum_Distributions

Repository files navigation

The Museum of Edible Earth


Important Notes:

  • The following data is real data from a real collection exhibit, however most of this information is publicly available on the exhibit website and catalog. Furthermore, I have received permission from masharu, the exhibit curator, to use the visuals and dashboards in my portfolio, and the master dataset will not be shown due to privacy reasons.
  • Files are labelled from 01-04. They can be read in numeric order. The dashbord is screenshotted and linked below, and it is the last step in the process.

Scenario and Objective:

The Museum of Edible Earth is a collection exhibit created and curated by Masharu Studios. It is a collection of different mineral samples associated with geophagy, the practice of consuming rocks and earth for health or cultural purposes. I have been asked to answer the following questions:

  • What is the distribution of collection samples over the years?

  • Where were the samples collected?

  • What are the distributions of taste and texture?

  • What are the distributions of shape, color and composition?

Data Report:

The Museum of Edible Earth (1)

To interact with the dashboard, see the Tableau section of the project, linked here:

Dashboard

Data Preprocessing:

Aside from generic cleaning (outliers, duplicates and missing values), this data presented unique preprocessing challenges. The mineral taste and texture descriptions were actually obtained publicly from comment sections on the website and placed into a word cloud on the mineral sample's catalog page like so:

image

This presented the following problems:

  1. Because the descriptions from the comment section are so varied, there are over 600 unique descriptions for a relatively small collection of minerals, which isn't very good for visualization. To solve this, I created a list of 20 tastes and 20 textures, and I applied them to each unique value in Excel. Below are a few examples.
image

2. Next, I have to map the new shortlist of flavors and textures onto the original data. I used the following code to ensure that each original flavor/texture was replaced with a combination of the tastes and textures from my shortlists.
taste_map = df2.set_index('Description')['Taste'].to_dict()
texture_map = df2.set_index('Description')['Texture'].to_dict()

def process_column(df, target_map):
    return (
        df['Tags'].str.split(',').explode().str.strip()
        .map(target_map).dropna().groupby(level=0)
        .unique().str.join(', ')
    )

df3['Taste'] = process_column(df3, taste_map)
df3['Texture'] = process_column(df3, texture_map)

For analysis, visualization and ensuring each value only appears once per cell, the data will be split using a method called 'explode'. This will split the values of my taste or texture column and duplicate everything around it in a new table. After dropping duplicates and ensuring the data was stripped and clean, I got the following output:

SKU Name Taste Texture
CG002C Mabele Smoked Clay N/A N/A
CG001A Clay Dappermarkt Bitter, Sour Dusty, Dry
CH001A Gletschermilch N/A N/A
CI001A Salty Calaba Experience, Earthy, Milky, Herbal, Bitter, Sour, Salty, Sweet Chalky, Experience, Dense, Creamy, Crunchy, Dry, Soft, Smooth, Airy, Damp, Dusty, Powdery, Damp, Grainy, Sticky
CI002A Salty Calaba Milky, Earthy, Herbal, Bitter, Sour, Salty, Tasty Creamy, Crunchy, Melty, Soft, Damp, Dry, Dusty, Sticky

To see the full scraping and cleaning process, please see the scraping section and the preprocessing section of the project, linked below.

Dictionary
Scraping
Preprocessing

Results and Observations:

Let's go through and answer each of the data questions:

  • Where and how were the minerals acquired? What countries where they commonly acquired?
The Museum of Edible Earth (1)

Most of the samples were collected in 2018 and 2019-2021. Most acquisitions were done over the internet, or received as gifts. A vast majority of samples came from Russia, Suriname and Ukraine.

  • What is the distribution of taste? What are the most dominant tastes?
image

The top 5 tastes are Earthy, Milky, Herbal, Bitter and Sweet. I won't count 'Tasty' or 'Experience', as they're a different kind of descriptive.

  • What is the distribution of texture? What are the most dominant textures?
image

The top 5 textures are Soft, Hard, Damp, Creamy and Crunchy. I won't count 'Tasty' or 'Experience', as they're a different kind of descriptive.

  • What is the distribution of Shape, Color and Composition?
image

The most common sample attributes are a clay-like composition, a white color and a raw shape.

To see all comments, conclusions and Python visuals, please see the analysis section of this project, linked below:

Analysis

About

In this end-to-end data project, I will clean and analyze data received from the Museum of Edible Earth collection exhibit (I have received prior consent to publicize the dashboards) in order to find the trends and distributions of the qualities and acquisition methods of the mineral samples on display.

Topics

Resources

Stars

Watchers

Forks

Contributors