This an exploration of the iris dataset


In [11]:
# third-party datasets
import pandas
from sklearn.datasets import load_iris

In [13]:
iris_data = load_iris()
iris_frame = pandas.DataFrame(iris_data.data, columns = iris_data.feature_names)

Data Exploration


In [5]:
iris_data.data.shape


Out[5]:
(150, 4)

In [14]:
iris_frame.describe()


Out[14]:
       sepal length (cm)  sepal width (cm)  petal length (cm)  \
count         150.000000        150.000000         150.000000   
mean            5.843333          3.054000           3.758667   
std             0.828066          0.433594           1.764420   
min             4.300000          2.000000           1.000000   
25%             5.100000          2.800000           1.600000   
50%             5.800000          3.000000           4.350000   
75%             6.400000          3.300000           5.100000   
max             7.900000          4.400000           6.900000   

       petal width (cm)  
count        150.000000  
mean           1.198667  
std            0.763161  
min            0.100000  
25%            0.300000  
50%            1.300000  
75%            1.800000  
max            2.500000  

There are 4 features and 150 data points.


In [20]:
for column in iris_frame.columns:
    print(column,iris_frame[column].hasnans)


('sepal length (cm)', False)
('sepal width (cm)', False)
('petal length (cm)', False)
('petal width (cm)', False)

There are no missing data points.