The Iris Dataset

This is a classic dataset that contains four measurements for 150 different iris flowers). Three species are represented, 50 flowers from each. There are no missing values. The following fields are present:

  • sepal_l - The sepal length in cm.
  • sepal_w - The sepal width in cm.
  • petal_l - The petal length in cm.
  • petal_w - The petal width in cm.
  • species - The type of flower. (Iris Setosa, Iris Versicolour, or Iris Virginica)

The following code shows 10 sample rows.


In [1]:
import pandas as pd

path = "./data/"
    
filename = os.path.join(path,"iris.csv")
df = pd.read_csv(filename,na_values=['NA','?'])
# Shuffle lines
np.random.seed(42) # Uncomment this line to get the same shuffle each time
df = df.reindex(np.random.permutation(df.index))
df.reset_index(inplace=True, drop=True)

df[0:10]


Out[1]:
sepal_l sepal_w petal_l petal_w species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 Iris-setosa
8 4.4 2.9 1.4 0.2 Iris-setosa
9 4.9 3.1 1.5 0.1 Iris-setosa

In [ ]: