Analysis of the Properties of Mushrooms

Lab One: Exploring Table Data

Luke Wood, Justin Ledford, Traian Pop

Introduction

Mushrooms are a type of fungus found in many, if not all, parts of the world, that have been used throughout history in a variety of fields. Fungi are extracted for both their beneficial and harmful properties. There is an entire subsector of biology dedicated to studying how mushrooms develop and evolve. After searching, we decided to focus on a specific set found on the UCI website due to the variety of attributes in the report.

Business Understanding

Data Background

The data we have selected to study is the analysis of 23 different gilled mushroom species of the *Agaricus* and *Lepiota* Families. Originally, this was collected in order to discover a more efficient and accurate way to tell if a mushroom is edible or poisonous as part of a field guide. This information was found on the UCI, collected by The Audubon Society Field Guide to North American Mushrooms (1981), and donated by Jeff Schlimmer.

The Aubudon Society collects information on over 700 different species of North American Mushrooms and compiles them in a concise and easy to understand handbook. Every aspect of a mushroom is detailed as to make it as simple as possible for anyone to differentiate one in the field.

We believe that this information could be used to create an effective classifier. All of the attributes, excluding poisonousness, are easy observable, and have a high correlation with whether or not a mushroom is poisonous or edible.

Purpose

This kind of information is vital to many industries, ranging from tourism to healthcare. Having the knowledge of whether a mushroom is usable or not could save a starving hiker's life or speed up the process of developing medicines involving those materials. Although most poisonous mushrooms only cause minor symptoms such as vomiting and diarrhea, children or animals can develop symptoms such organ damage and, in some cases, death. The field is currently based on a lot of guesswork and tedious work, and even professional mushroomers can misidentify a specimen.

We want to analyze this data in order to discern a mushrooms edibleness by a combination of its properties.

Data Understanding

The picture shows a typical Blue Milk Mushrooms.This has been added to aid in seeing where each attribute lies on the mushroom itself.

The following are our 23 categorical data attributes that will be examined in this report, grouped by the parts and functions of the mushroom:

Cap Information(OHE) - Attributes relating to the mushroom cap, the part of the mushroom that protects the gills from harm
- Cap Shape(OHE)
- Cap Surface(OHE)
- Cap Color(OHE)
Odor(OHE) - The aroma a mushroom gives off
Gill Information(OHE) - Attributes relating to the mushroom gills, the part of the mushroom that releases spores
- Gill Attachment(OHE)
- Gill Spacing(OHE)
- Gill Color(OHE)
- Gill Size(Binary)
Stalk Information - Attributes relating to the mushroom gills, the part of the mushroom that holds the cap up
- Stalk Root(OHE)
- Stalk Shape(Binary)
- Stalk Surface Above Ring(OHE)
- Stalk Surface Below Ring(OHE)
- Stalk Color Above Ring(OHE)
- Stalk Color Below Ring(OHE)
Veil Information(OHE) - Attributes relating to the mushroom veil, the part of the mushroom that is used for protection of the spores
- Veil Color(OHE)
- Veil Type(Binary)
Ring Information(OHE) - Attributes pertaining to a mushroom ring, a vestigial protective covering for the mushroom
- Ring Number(OHE)
- Ring Type(OHE)
Spore Print Color(OHE) - Mushrooms use spore to reproduce, typically different colors among species
Population(OHE) - How the mushroom grows in terms of clusters
Habitat(OHE) - Where the mushroom typically grows in the wild
Bruises(Binary) - Whether or not a mushroom has surface bruises
Poisonous(Binary) - Attribute labeling if a mushroom is edible or poisonous

(OHE) - One Hot Encoding

Although the above attributes can be assigned to specific data representations, we decided to not go along with it as our data set was not memory intensive enough to require it. Leaving them in their categorical representation also allowed the data to be read easier and understood.

Data Quality

Loading & Preprocessing Data



In [1]:

    
import pandas as pd
import numpy as np
import requests

descriptors_url = 'https://raw.githubusercontent.com/LukeWoodSMU/Mushroom-Classification/master/raw_data/descriptors.txt'
descriptors = requests.get(descriptors_url).text

def get_attribute_dictionary():
    # Loads from the descriptors file all attributes w/ mapping to their mappings
    attr_dict = dict([(x.split(":")[0],dict([[y.split("=")[1].strip(),y.split("=")[0].strip()] for y in x.split(":")[1].split(",")])) for x in descriptors.splitlines()])
    attr_dict.pop('stalk-root', None)
    return attr_dict
    

def get_data_frame(remove_dups=False):
    attribute_names = [x.split(":")[0] for x in descriptors.splitlines()]
    df = pd.read_csv('https://raw.githubusercontent.com/LukeWoodSMU/Mushroom-Classification/master/raw_data/agaricus-lepiota.data.txt',names=attribute_names)

    for col in df.columns:
        df[col] = df[col].astype('category')

    if(remove_dups):
        df = df.drop_duplicates()
    
    df.drop('stalk-root', inplace=True, axis=1)
    
    return df

Below are the first 5 instances of our data set.



In [2]:

    
get_data_frame().head()









    Out[2]:






  
    
      
      poisonous
      cap-shape
      cap-surface
      cap-color
      bruises
      odor
      gill-attachment
      gill-spacing
      gill-size
      gill-color
      ...
      stalk-surface-below-ring
      stalk-color-above-ring
      stalk-color-below-ring
      veil-type
      veil-color
      ring-number
      ring-type
      spore-print-color
      population
      habitat
    
  
  
    
      0
      p
      x
      s
      n
      t
      p
      f
      c
      n
      k
      ...
      s
      w
      w
      p
      w
      o
      p
      k
      s
      u
    
    
      1
      e
      x
      s
      y
      t
      a
      f
      c
      b
      k
      ...
      s
      w
      w
      p
      w
      o
      p
      n
      n
      g
    
    
      2
      e
      b
      s
      w
      t
      l
      f
      c
      b
      n
      ...
      s
      w
      w
      p
      w
      o
      p
      n
      n
      m
    
    
      3
      p
      x
      y
      w
      t
      p
      f
      c
      n
      n
      ...
      s
      w
      w
      p
      w
      o
      p
      k
      s
      u
    
    
      4
      e
      x
      s
      g
      f
      n
      f
      w
      b
      k
      ...
      s
      w
      w
      p
      w
      o
      e
      n
      a
      g
    
  

5 rows × 22 columns

Missing Values

The data sets has 2480 missing values, all for one attribute, the stalk root. These missing values most likely exist because the stalk root is the only attribute that could not be visible unless the mushroom had been pulled out of the ground.

To deal with these missing values we can either eliminate the instances with missing values or eliminate the column altogether since it would be difficult to impute the values with this data set. We have decided to eliminate the column because the attribute is not relevant in the case of determining the edibility of the mushroom without removing it from the ground, which was the main thing we were interested in analyzing.



In [34]:

    
def get_data_frame_with_miss(remove_dups=False):
    attribute_names = [x.split(":")[0] for x in descriptors.splitlines()]
    df = pd.read_csv('https://raw.githubusercontent.com/LukeWoodSMU/Mushroom-Classification/master/raw_data/agaricus-lepiota.data.txt',names=attribute_names)

    for col in df.columns:
        df[col] = df[col].astype('category')

    if(remove_dups):
        df = df.drop_duplicates()
            
    return df

def get_attribute_dictionary_with_miss():
    # Loads from the descriptors file all attributes w/ mapping to their mappings
    attr_dict = dict([(x.split(":")[0],dict([[y.split("=")[1].strip(),y.split("=")[0].strip()] for y in x.split(":")[1].split(",")])) for x in descriptors.splitlines()])
    return attr_dict
    

missing_val=get_data_frame_with_miss()
missing_val['stalk-root'].head()









    Out[34]:





0    e
1    c
2    c
3    e
4    e
Name: stalk-root, dtype: category
Categories (5, object): [?, b, c, e, r]

Single Value Column

We also noticed that in the veil-type column, only one value was present. This makes the column completely irrelevant. This likely means that of the 23 species studied none of them had a veil-type of anything other than partial. This is interesting to see but not necessarily surprising, as even though the species come from two different attributes, they are still fairly similar biologically.



In [35]:

    
single_val=get_data_frame()
single_val['veil-type'].head()









    Out[35]:





0    p
1    p
2    p
3    p
4    p
Name: veil-type, dtype: category
Categories (1, object): [p]

Repeat Data

Due to there being 8124 rows and only 23 species of mushrooms, we assumed that there would inevitably be lot of identical rows. Although generally removing duplicate data seems like something that needs to be done, we decided that situations would arise when the duplicate data would be useful.

In order to achieve this dual functionality, we implementing our function that reads the DataFrame from a file, to have the option to remove duplicate values.

Below is a short piece of code showing what percentage of the data is unique.



In [36]:

    
df = get_data_frame()
total_rows = len(df)

df.drop_duplicates(inplace=True)
no_dups = len(df)

print ("Total duplicates: ", (total_rows - no_dups))









    



Total duplicates:  0

We were surprised to see that there were zero duplicates in the entire set, but once we sat down and analyzed the data more, we came up with a few possible conclusions.

The original data collectors already had the data removed.
Some of the missing data ended up being the duplicate data.
There was never any duplicate data to begin with.



In [37]:

    
df = get_data_frame()
p = 1
for col in df:
    p *= (1/len(df[col].cat.categories))
print (p**2)









    



4.206045380432561e-28

The chances of not having a single duplicate specimen with 28 different attributes and each attribute having a different number of variables is 1.683e-26%. Due to this probability, we are inclined to go with conclusion #3 regarding the reason for zero duplicate data.

Initial Analysis

One of the first things we did was analyze the data by getting the ratios of an attribute value's occurrence in all the instances to that value's occurrence in the poisonous instances to determine which attributes had values that only or mostly occurred in poisonous mushrooms.



In [38]:

    
def get_hist_data():
    attr_map = get_attribute_dictionary()
    df = get_data_frame()

    hist_data = dict([(atr,None) for atr in attr_map])
    poison_hist_data = dict([(atr,None) for atr in attr_map])

    for x in attr_map:
        counts = dict([(attr_map[x][y],0) for y in attr_map[x]])
        poison_counts = dict([(attr_map[x][y],0) for y in attr_map[x]])

        for val, poison in zip(df[x],df["poisonous"]):
            counts[attr_map[x][val]]+=1
            if(poison == "p"):
                poison_counts[attr_map[x][val]]+=1
        hist_data[x] = counts
        poison_hist_data[x] = poison_counts

    return hist_data, poison_hist_data

#tf_tpf - Term Frquency to Poison Frequency
def get_tf_tpf():
    data,poison_data = get_hist_data()
    tf_tpf = {}

    for val in data:
        tf_tpf[val] = dict([(x,poison_data[val][x]/data[val][x]) for x in data[val] if data[val][x] != 0])
    return tf_tpf

counts, poison_counts = get_hist_data()

tf_tpf = {}
for val in counts:
    tf_tpf[val] = dict([(x,poison_counts[val][x]/counts[val][x]) for x in counts[val] if counts[val][x] != 0])

print(tf_tpf["odor"])









    



{'spicy': 1.0, 'fishy': 1.0, 'foul': 1.0, 'pungent': 1.0, 'anise': 0.0, 'musty': 1.0, 'creosote': 1.0, 'almond': 0.0, 'none': 0.034013605442176874}

As we can see, the correlations between certain values of the odor attribute and whether or not a mushroom is poisonous is 100%. The only time that there is a question of whether or not a mushroom is poisonous is when the mushroom lacks an odor. From here we decided to start plotting the data to get a visual sense of the relationship between attributes.

Visualizations

We will be using Matplotlib's pylot and Seaborn to plot our data.



In [8]:

    
import matplotlib.pyplot as plt
import seaborn as sns

Comparative bar charts

In order to get a glimpse of what specific attribute values could be used to determine if a mushroom was edible or poisonous we generated some bar charts to compare the attribute values by poisonous and edible mushrooms.

Since most relationships between categorical variables are usually analyzed by looking at the frequencies of those variables we have implemented a function that can be used to collect the frequencies of each attribute value for two attributes.



In [9]:

    
def attr_freqs(attr1, attr2):
    df = get_data_frame()

    labels1 = get_attribute_dictionary()[attr1]
    labels2 = get_attribute_dictionary()[attr2]

    data = []

    for a in df[attr1].cat.categories:
        column = df[attr2][df[attr1] == a].value_counts()
        data.append(column)

    observed = pd.concat(data, axis=1)
    observed.columns = [labels1[a] for a in df[attr1].cat.categories]

    return observed


attr_freqs('odor', 'poisonous')

Since we would like to compare attributes against the poisonous and edible classes, we have created a function to plot any attribute with side by side bar charts comparing each of the values of that attribute.



In [10]:

    
def plot_comparative_data(attr, plot=True, save=False):
    data = attr_freqs(attr, 'poisonous')

    labels = get_attribute_dictionary()[attr]

    index = np.arange(data.shape[1])
    bar_width = 0.35
    opacity=0.4

    fig, ax = plt.subplots()

    plt.bar(index, data.loc['e',:].values, bar_width, align='center',
            color='b', label='edible', alpha=opacity)
    plt.bar(index + bar_width, data.loc['p',:].values, bar_width,
            align='center', color='r', label='poisonous', alpha=opacity)

    plt.xlabel('Attributes')
    plt.ylabel('Frequency')
    plt.title('Frequency by attribute and edibility ({})'.format(attr))
    plt.xticks(index + bar_width / 2, data.columns)

    plt.legend()

    plt.tight_layout()
    plt.show()
    plt.close()



In [11]:

    
plot_comparative_data('odor')

From the plot we can see that any mushroom with a foul, spicy and fishy smell as almost certainly poisonous. No smell is almost always edible, but in some rare cases it can be poisonous.

Let's take a look at spore print color.



In [12]:

    
plot_comparative_data('spore-print-color')

We can see that chocolate and white mushrooms are usually poisonous so it is best to avoid those. Black or brown are usually edible, but not always.

Determining an attribute's association with edibility

To determine association between attributes and edibility we used Pearson's chi-squared test on the frequency of attribute values and then ordered the attributes in descending order of the chi-squared statistic. The chi-squared test works by comparing the observed data to expected data (the null hypothesis which is an even distribution across each row and column) with the following equation,

$$ \chi^2 = \sum^n_{i=1} \frac{ (O_i - E_i)^2 }{ E_i } $$

where $O$ is the observed data point and $E$ is the expected data point.

With the following function we can get a contingency table of the expected and observed values of any two attributes:



In [13]:

    
def expected_data(observed):
    expected = np.zeros(observed.shape)

    total = observed.sum().sum()
    for j in [0, 1]:
        for i, col_total in enumerate(observed.sum()):
            row_total = observed.sum(axis=1)[j]
            expected[j][i] = row_total*col_total/total

    return pd.DataFrame(expected, index=observed.index,
                        columns=observed.columns)



In [14]:

    
o = attr_freqs('odor', 'poisonous')
o



In [15]:

    
expected_data(o)









    Out[15]:






  
    
      
      almond
      creosote
      foul
      anise
      musty
      none
      pungent
      spicy
      fishy
    
  
  
    
      e
      207.188577
      99.450517
      1118.818316
      207.188577
      18.646972
      1827.40325
      132.600689
      298.351551
      298.351551
    
    
      p
      192.811423
      92.549483
      1041.181684
      192.811423
      17.353028
      1700.59675
      123.399311
      277.648449
      277.648449

Using these two tables for each attribute we can collect the chi-squared test statistic for each, and then sort them in descending order to rank the attributes by association with being poisonous or edible.



In [16]:

    
cat_names = get_attribute_dictionary().keys()

chisqrs = []
for cat in cat_names:
    if cat != 'poisonous':
        observed = attr_freqs(cat, 'poisonous')
        expected = expected_data(observed)
        chisqr = (((observed-expected)**2)/expected).sum().sum()
        chisqrs.append((chisqr, cat))

chisqrs = sorted(chisqrs)[::-1]
chisqrs = chisqrs[:10]
values = [d[0] for d in chisqrs]
labels = [d[1].replace("-", "\n") for d in chisqrs]

index = np.arange(len(chisqrs))
bar_width = .35
opacity=0.4

plt.title("Attributes most associated with edibility")
plt.bar(index, values, bar_width, align='center')
plt.xticks(index, labels)
plt.ylabel("Chi-squared values")
plt.xlabel("Attributes")
plt.autoscale()
plt.tight_layout()
plt.show()

As we can see from the plot, odor is the most associated attribute with edibility, followed by spore print color and gill color. These rankings seem to agree heavily with our comparative barcharts.

While this use of the chi-squared test statistic may not be the traditional use of finding the p-value and accepting or rejecting the null hypothesis to determine independence, it still provided us with a metric to rank the attributes by their association of edibility.

Scatterplot

Next we decided to plot a scatterplot matrix of the top 5 most associated attributes with edibility. In order to plot categorical variables on a scatterplot we needed to add some jitter to the data. This was done by adding a random number between -0.167 and 0.167 to all the categorical codes.



In [19]:

    
df = get_data_frame()
for col in df:
    if col in ['odor', 'spore-print-color', 'gill-color', 'ring-type',
               'stalk-surface-above-ring']:
        df[col] = df[col].cat.codes + (np.random.rand(len(df),) - .5)/3
    elif col == 'poisonous':
        df[col] = df[col].cat.codes
    else:
        del df[col]

g = sns.pairplot(df, hue='poisonous')
plt.autoscale()
plt.tight_layout()
plt.show()
plt.close()

From the scatter plots we can clearly see how values of certain variables are grouped between poisonous and edible. We can also see how combinations of two variable values strongly correlate to belonging to a poisonous or edible mushroom. For example a fibrous stalk surface above ring and an evanescent ring type are almost certainly edible, however a fibrous stalk surface ring and a pendant ring type are almost certainly poisonous.

Because the values were converted to the categorical codes to plot, we have generated a legend for the values of each attribute.



In [21]:

    
df = get_data_frame()
attr = get_attribute_dictionary()
labels = {}
for col in df:
        if col in ['odor', 'spore-print-color', 'gill-color', 'ring-type',
                               'stalk-surface-above-ring', 'poisonous']:
            labels[col] = [attr[col][c] for c in df[col].cat.categories] + \
                          (12-len(df[col].cat.categories))*[" "]
pd.DataFrame(labels)









    Out[21]:






  
    
      
      gill-color
      odor
      poisonous
      ring-type
      spore-print-color
      stalk-surface-above-ring
    
  
  
    
      0
      buff
      almond
      edible
      evanescent
      buff
      fibrous
    
    
      1
      red
      creosote
      poisonous
      flaring
      chocolate
      silky
    
    
      2
      gray
      foul
      
      large
      black
      smooth
    
    
      3
      chocolate
      anise
      
      none
      brown
      scaly
    
    
      4
      black
      musty
      
      pendant
      orange
      
    
    
      5
      brown
      none
      
      
      green
      
    
    
      6
      orange
      pungent
      
      
      purple
      
    
    
      7
      pink
      spicy
      
      
      white
      
    
    
      8
      green
      fishy
      
      
      yellow
      
    
    
      9
      purple
      
      
      
      
      
    
    
      10
      white
      
      
      
      
      
    
    
      11
      yellow

Heatmaps

To get a better sense of the correlations we wanted to create a heat map that showed the correlation between all possible attributes.



In [22]:

    
df = get_data_frame()
attr_dict = get_attribute_dictionary()

data = []
for attribute in attr_dict:
    for sub_attr in attr_dict[attribute]:
        data.append((attr_dict[attribute][sub_attr],[1 if x==sub_attr else 0 for x in df[attribute]]))

        l = [x[1] for x in data]
    
corr_df = pd.DataFrame(np.array(l).transpose(), columns=[x[0] for x in data]).corr().dropna(thresh=1).drop("distant")


fig, ax = plt.subplots(figsize=(28,28))

sns.heatmap(corr_df)
fig.autofmt_xdate()

locs, labels = plt.yticks()
plt.setp(labels,rotation=30)

locs, labels = plt.xticks()
plt.setp(labels,rotation=60)


plt.show()

As you can see, there are way too many attributes to look at this chart alone.

To see some of the more strongly correlated attributes more closely, we decide to create some heat maps displaying the relative frequencies by the column for any two variables. Below is our function to plot these heat maps.



In [39]:

    
def heatmap(attr1, attr2,annot=True):
    df = get_data_frame_with_miss()
    labels1 = get_attribute_dictionary_with_miss()[attr1]
    labels2 = get_attribute_dictionary_with_miss()[attr2]

    data = []

    for a in df[attr1].cat.categories:
        column = df[attr2][df[attr1] == a].value_counts()/len(df[df[attr1]==a])
        data.append(column)

    d = pd.concat(data, axis=1)
    d.columns = [labels1[a] for a in df[attr1].cat.categories]

    ticks = [labels2[a] for a in d.index]

    sns.heatmap(d, annot=annot, yticklabels=ticks, fmt='.2f')


    plt.title("{} and {}".format(attr1, attr2))
    plt.yticks(rotation=0)
    plt.ylabel(attr2)
    plt.xlabel(attr1)
    plt.tight_layout()
    plt.show()
    plt.clf()



In [40]:

    
heatmap('gill-attachment', 'veil-color')



In [41]:

    
heatmap('odor', 'poisonous')

We found that odor and whether or not the mushroom was poisonous to be a highly correlated feature, with some odors having a 100% relative frequency with being poisonous.



In [42]:

    
heatmap('stalk-shape', 'stalk-root')

We ended up dropping the stalk-root attribute due to 25% of the rows missing a stalk-root value. We thought the heat map was still interesting enough to include it anyways. It shows how the stalk root values could have been imputed by looking at the stalk shape had we decided that the stalk root variable was important enough to keep.



In [43]:

    
from IPython.display import IFrame
#Visualization forked from: https://bl.ocks.org/mbostock/1044242
IFrame("./visualizations/edge-chart/index.html",width=1000,height=0)









    Out[43]:

This hierarchical edge bundling graph represents how often each category matches with another one. If two attribute values are in the same row at least X amount of times they appear in the graph. The bar on top of the graph can be altered in order to show only connections meeting a certain number of minimum common connections.

As you can see, when we move the bar all the way to the right, we see that the most common combination of attribute values is veil-type partial, veil-color white and gill-attachment free. This observation matches up with our previous analysis.

However, when moving the bar to the complete opposite side, the graph starts showing more about the data that isn't there. For example, there are attribute values that have zero lines connecting from them at 25 minimum, thus showing how minor of a percentage they each are. These attribute value also seem to match up with the other conclusions we derived.

Attempting MCA

As we only have categorical and binary data, we decided to attempt an MCA to reduce the number of features we were looking at. If we had numerical data we would have used PCA.

Unfortunately, we lacked the background to implement MCA from scratch and could not get the MCA python module to work properly due to the lack of documentation and examples online. Here is as far as we could get using the example at: http://nbviewer.jupyter.org/github/esafak/mca/blob/master/docs/mca-BurgundiesExample.ipynb.



In [ ]:

    
import mca
import pandas as pd
import numpy as np

df = get_data_frame()

mca_ben = mca.mca(df,cols=["gill-color","stalk-surface-above-ring","ring-type","spore-print-color"], ncols=5)
mca_ind = mca.mca(df,cols=["gill-color","stalk-surface-above-ring","ring-type","spore-print-color"], ncols=5, benzecri=False)

print(mca_ben)
print(mca_ind)

At this point we were stuck as we were not familiar enough with MCA to keep up with the example. The library lacks further documentation and while we tried to implement the rest of it we could not proceed as far as the original owner of the example Jupyter Notebook did. However, we feel that we could have done a lot with MCA as the majority of our attributes are correlated fairly high with at least one other attribute (based on our heatmaps).

Citations

Data set: https://archive.ics.uci.edu/ml/datasets/Mushroom
For picture 1: http://michaelagleatonportfolio2014.weebly.com/blue-milk-mushroom.html
Data Visualization Information: https://github.com/eclarson/MachineLearningNotebooks/blob/master/03.%20DataVisualization.ipynb
D3 Hierchial Edge Bundling: https://bl.ocks.org/mbostock/1044242

	poisonous	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g

	almond	creosote	foul	anise	musty	none	pungent	spicy	fishy
e	207.188577	99.450517	1118.818316	207.188577	18.646972	1827.40325	132.600689	298.351551	298.351551
p	192.811423	92.549483	1041.181684	192.811423	17.353028	1700.59675	123.399311	277.648449	277.648449

	gill-color	odor	poisonous	ring-type	spore-print-color	stalk-surface-above-ring
0	buff	almond	edible	evanescent	buff	fibrous
1	red	creosote	poisonous	flaring	chocolate	silky
2	gray	foul		large	black	smooth
3	chocolate	anise		none	brown	scaly
4	black	musty		pendant	orange
5	brown	none			green
6	orange	pungent			purple
7	pink	spicy			white
8	green	fishy			yellow
9	purple
10	white
11	yellow

	poisonous	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g

	poisonous	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g