Mushroom Classification: Will I die if I eat this? Classifying mushrooms by their different physical features using machine learning techniques in order to determine if they are poisonous or not.
Mushrooms are the Mariana Trench of land. We only know so much about them and one reason for this is they are so hard to identify. With a lot of species looking the same it can be hard to figure out which one we're looking at. In recent years mycology, the study of mushrooms, has made leaps into what we know about the mysterious fungi, from collecting data, finding new (and even rare) species and documenting as much as possible. We now know more than ever. There's one issue though. There are a lot of amateur mushroom enthusiasts and people who are interested in cooking wild mushrooms but with the difficulty of identifying mushrooms it's hard to tell if they are edible or not. You have many species of mushroom that look the same, have the same cap shape, same color, same size, but the spores of the mushrooms have different colors or the stem as a different texture. The idea is to have computers to learn which mushrooms may or may not be edible in order to better inform us about the mystery of fungi.
The data that will be used for this project is from Kaggle.com. The data on Kaggle is originally from the UCI Machine Learning repository. The specific data is mock data from “23 species of gilled mushrooms in the Agaricus and Lepiota Family Mushroom”. Each mushroom is titled edible, non edible, unknown or not recommended. They have numerous physical features including cap-shape, cap-surface, cap-color, bruises, odor, gill-attachment, gill-spacing, gill-size, gill-color, stalk-shape, stalk-root, stalk-surface-above-ring, stalk-surface-below-ring, stalk-color-above-ring, stalk-color-below-ring, veil-type, veil-color, ring-number, ring-type, spore-print-color, population, habitat. https://www.kaggle.com/uciml/mushroom-classification.
The plan is to evaluate classification using numerous methods and see which one is best. SVM, neural network, logistic regression and possibly others if there is time to experiment with them.
This project will be done by both Matthew Becker and Aidan DeBolt.
I have already downloaded the data set. Since everything is in string format I have switched over to numerical classification just for ease. Next is to plot everything to see what we’re looking at.
By Nov. 9th/10th I'd like to have the data all nicely plotted and cleaned. By Nov. 16th/17th I'd like to have the learning algorithms ready, maybe a few bugs. By Nov 20th I would like to have bugs figured out and to have begun the write up. A bit before presentation on Dec. 5th/6th I would like to havefinal plots and write up completed