Auto MPG Data Set

This is a classic dataset that contains four measurements for 150 different iris flowers). Three species are represented, 50 flowers from each. There are no missing values. The following fields are present:

  • mpg - The miles per gallon for the car.
  • cylinders - The number of cylinders) in the car.
  • displacement - The engine displacement.
  • horsepower - The horsepower produced by the car/engine.
  • weight - The weight of the car.
  • acceleration - The acceleration of the car.
  • year - The year the car was produced.
  • origin - Where the car was produced, 1=USA, 2=Europe, 3=Asia.
  • name - The name of the car.

The following code shows 10 sample rows.


In [3]:
import pandas as pd
import numpy as np

path = "./data/"
    
filename = os.path.join(path,"auto-mpg.csv")
df = pd.read_csv(filename,na_values=['NA','?'])

# Shuffle
np.random.seed(42)
df = df.reindex(np.random.permutation(df.index))
df.reset_index(inplace=True, drop=True)

df[0:10]


Out[3]:
mpg cylinders displacement horsepower weight acceleration year origin name
0 33.0 4 91.0 53.0 1795 17.4 76 3 honda civic
1 28.0 4 120.0 79.0 2625 18.6 82 1 ford ranger
2 19.0 6 232.0 100.0 2634 13.0 71 1 amc gremlin
3 13.0 8 318.0 150.0 3940 13.2 76 1 plymouth volare premier v8
4 14.0 8 318.0 150.0 4237 14.5 73 1 plymouth fury gran sedan
5 27.0 4 97.0 88.0 2100 16.5 72 3 toyota corolla 1600 (sw)
6 24.0 4 140.0 92.0 2865 16.4 82 1 ford fairmont futura
7 13.0 8 440.0 215.0 4735 11.0 73 1 chrysler new yorker brougham
8 17.0 8 260.0 110.0 4060 19.0 77 1 oldsmobile cutlass supreme
9 21.0 6 200.0 NaN 2875 17.0 74 1 ford maverick

In [ ]: