Data Manipulation with Numpy and Pandas

Handling with large data is easy in Python. In the simplest way using arrays. However, they are pretty slow. Numpy and Panda are two great libraries for dealing with datasets. Numpy isused for homogenous n-dimensional data (matrices). Pandas is used for heterogenous tables (CSV, MS Excel tables). Pandas is internally based on Numpy, too. See http://scipy-lectures.github.io/ for a more detailed lesson.



In [16]:

    
import numpy as np



In [17]:

    
# Generating a random array
X = np.random.random((3, 5))  # a 3 x 5 array

print(X)









    



[[ 0.77347902  0.45273892  0.15401666  0.4357073   0.74741831]
 [ 0.19892103  0.8583207   0.89480927  0.5322447   0.94306179]
 [ 0.48248829  0.10666105  0.30380058  0.74718615  0.25991219]]

Accessing elements



In [18]:

    
# get a single element
X[0, 0]









    Out[18]:





0.77347901713819489



In [19]:

    
# get a row
X[1]









    Out[19]:





array([ 0.19892103,  0.8583207 ,  0.89480927,  0.5322447 ,  0.94306179])



In [20]:

    
# get a column
X[:, 1]









    Out[20]:





array([ 0.45273892,  0.8583207 ,  0.10666105])



In [21]:

    
# Transposing an array
X.T









    Out[21]:





array([[ 0.77347902,  0.19892103,  0.48248829],
       [ 0.45273892,  0.8583207 ,  0.10666105],
       [ 0.15401666,  0.89480927,  0.30380058],
       [ 0.4357073 ,  0.5322447 ,  0.74718615],
       [ 0.74741831,  0.94306179,  0.25991219]])



In [22]:

    
print(X.shape)
print(X.reshape(5, 3)) #change the layout of the matrix









    



(3, 5)
[[ 0.77347902  0.45273892  0.15401666]
 [ 0.4357073   0.74741831  0.19892103]
 [ 0.8583207   0.89480927  0.5322447 ]
 [ 0.94306179  0.48248829  0.10666105]
 [ 0.30380058  0.74718615  0.25991219]]



In [23]:

    
# indexing by an array of integers (fancy indexing)
indices = np.array([3, 1, 0])
print(indices)
X[:, indices]









    



[3 1 0]






    Out[23]:





array([[ 0.4357073 ,  0.45273892,  0.77347902],
       [ 0.5322447 ,  0.8583207 ,  0.19892103],
       [ 0.74718615,  0.10666105,  0.48248829]])

Operations along an axis



In [24]:

    
X









    Out[24]:





array([[ 0.77347902,  0.45273892,  0.15401666,  0.4357073 ,  0.74741831],
       [ 0.19892103,  0.8583207 ,  0.89480927,  0.5322447 ,  0.94306179],
       [ 0.48248829,  0.10666105,  0.30380058,  0.74718615,  0.25991219]])



In [25]:

    
X.shape









    Out[25]:





(3, 5)



In [26]:

    
np.sum(X, axis=1) # 1...columns









    Out[26]:





array([ 2.56336021,  3.42735748,  1.90004825])



In [27]:

    
np.max(X, axis=0) # 0...rows









    Out[27]:





array([ 0.77347902,  0.8583207 ,  0.89480927,  0.74718615,  0.94306179])

A quick-ish introduction to Pandas

based on http://pandas.pydata.org/pandas-docs/stable/10min.html



In [28]:

    
import numpy as np
import pandas as pd



In [29]:

    
#use a standard dataset of heterogenous data
cars = pd.read_csv('data/mtcars.csv')
cars.head()









    Out[29]:






  
    
      
      car
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
  
  
    
      0
      Mazda RX4
      21.0
      6
      160
      110
      3.90
      2.620
      16.46
      0
      1
      4
      4
    
    
      1
      Mazda RX4 Wag
      21.0
      6
      160
      110
      3.90
      2.875
      17.02
      0
      1
      4
      4
    
    
      2
      Datsun 710
      22.8
      4
      108
      93
      3.85
      2.320
      18.61
      1
      1
      4
      1
    
    
      3
      Hornet 4 Drive
      21.4
      6
      258
      110
      3.08
      3.215
      19.44
      1
      0
      3
      1
    
    
      4
      Hornet Sportabout
      18.7
      8
      360
      175
      3.15
      3.440
      17.02
      0
      0
      3
      2



In [30]:

    
#list all columns
cars.columns









    Out[30]:





Index(['car', 'mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am',
       'gear', 'carb'],
      dtype='object')



In [31]:

    
#we want to use the car as the "primary key" of a row
cars.index = cars.pop('car')
cars.head()









    Out[31]:






  
    
      
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
    
      car
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Mazda RX4
      21.0
      6
      160
      110
      3.90
      2.620
      16.46
      0
      1
      4
      4
    
    
      Mazda RX4 Wag
      21.0
      6
      160
      110
      3.90
      2.875
      17.02
      0
      1
      4
      4
    
    
      Datsun 710
      22.8
      4
      108
      93
      3.85
      2.320
      18.61
      1
      1
      4
      1
    
    
      Hornet 4 Drive
      21.4
      6
      258
      110
      3.08
      3.215
      19.44
      1
      0
      3
      1
    
    
      Hornet Sportabout
      18.7
      8
      360
      175
      3.15
      3.440
      17.02
      0
      0
      3
      2



In [32]:

    
#describe our dataset
cars.describe()



In [33]:

    
cars.sort_index(inplace=True)
cars.head()









    Out[33]:






  
    
      
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
    
      car
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      AMC Javelin
      15.2
      8
      304
      150
      3.15
      3.435
      17.30
      0
      0
      3
      2
    
    
      Cadillac Fleetwood
      10.4
      8
      472
      205
      2.93
      5.250
      17.98
      0
      0
      3
      4
    
    
      Camaro Z28
      13.3
      8
      350
      245
      3.73
      3.840
      15.41
      0
      0
      3
      4
    
    
      Chrysler Imperial
      14.7
      8
      440
      230
      3.23
      5.345
      17.42
      0
      0
      3
      4
    
    
      Datsun 710
      22.8
      4
      108
      93
      3.85
      2.320
      18.61
      1
      1
      4
      1



In [34]:

    
cars.sort_values('mpg').head(15)









    Out[34]:






  
    
      
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
    
      car
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Cadillac Fleetwood
      10.4
      8
      472.0
      205
      2.93
      5.250
      17.98
      0
      0
      3
      4
    
    
      Lincoln Continental
      10.4
      8
      460.0
      215
      3.00
      5.424
      17.82
      0
      0
      3
      4
    
    
      Camaro Z28
      13.3
      8
      350.0
      245
      3.73
      3.840
      15.41
      0
      0
      3
      4
    
    
      Duster 360
      14.3
      8
      360.0
      245
      3.21
      3.570
      15.84
      0
      0
      3
      4
    
    
      Chrysler Imperial
      14.7
      8
      440.0
      230
      3.23
      5.345
      17.42
      0
      0
      3
      4
    
    
      Maserati Bora
      15.0
      8
      301.0
      335
      3.54
      3.570
      14.60
      0
      1
      5
      8
    
    
      AMC Javelin
      15.2
      8
      304.0
      150
      3.15
      3.435
      17.30
      0
      0
      3
      2
    
    
      Merc 450SLC
      15.2
      8
      275.8
      180
      3.07
      3.780
      18.00
      0
      0
      3
      3
    
    
      Dodge Challenger
      15.5
      8
      318.0
      150
      2.76
      3.520
      16.87
      0
      0
      3
      2
    
    
      Ford Pantera L
      15.8
      8
      351.0
      264
      4.22
      3.170
      14.50
      0
      1
      5
      4
    
    
      Merc 450SE
      16.4
      8
      275.8
      180
      3.07
      4.070
      17.40
      0
      0
      3
      3
    
    
      Merc 450SL
      17.3
      8
      275.8
      180
      3.07
      3.730
      17.60
      0
      0
      3
      3
    
    
      Merc 280C
      17.8
      6
      167.6
      123
      3.92
      3.440
      18.90
      1
      0
      4
      4
    
    
      Valiant
      18.1
      6
      225.0
      105
      2.76
      3.460
      20.22
      1
      0
      3
      1
    
    
      Hornet Sportabout
      18.7
      8
      360.0
      175
      3.15
      3.440
      17.02
      0
      0
      3
      2



In [35]:

    
cars.sort_values('hp', ascending=False).head()









    Out[35]:






  
    
      
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
    
      car
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Maserati Bora
      15.0
      8
      301
      335
      3.54
      3.570
      14.60
      0
      1
      5
      8
    
    
      Ford Pantera L
      15.8
      8
      351
      264
      4.22
      3.170
      14.50
      0
      1
      5
      4
    
    
      Camaro Z28
      13.3
      8
      350
      245
      3.73
      3.840
      15.41
      0
      0
      3
      4
    
    
      Duster 360
      14.3
      8
      360
      245
      3.21
      3.570
      15.84
      0
      0
      3
      4
    
    
      Chrysler Imperial
      14.7
      8
      440
      230
      3.23
      5.345
      17.42
      0
      0
      3
      4

Selection

Note While many of the NumPy access methods work on DataFrames, use the pandas-specific data access methods, .at, .iat, .loc, .iloc and .ix.

See the Indexing section and below.



In [36]:

    
#single column
cars['mpg']
#depending on the name also cars.mpg works









    Out[36]:





car
AMC Javelin            15.2
Cadillac Fleetwood     10.4
Camaro Z28             13.3
Chrysler Imperial      14.7
Datsun 710             22.8
Dodge Challenger       15.5
Duster 360             14.3
Ferrari Dino           19.7
Fiat 128               32.4
Fiat X1-9              27.3
Ford Pantera L         15.8
Honda Civic            30.4
Hornet 4 Drive         21.4
Hornet Sportabout      18.7
Lincoln Continental    10.4
Lotus Europa           30.4
Maserati Bora          15.0
Mazda RX4              21.0
Mazda RX4 Wag          21.0
Merc 230               22.8
Merc 240D              24.4
Merc 280               19.2
Merc 280C              17.8
Merc 450SE             16.4
Merc 450SL             17.3
Merc 450SLC            15.2
Pontiac Firebird       19.2
Porsche 914-2          26.0
Toyota Corolla         33.9
Toyota Corona          21.5
Valiant                18.1
Volvo 142E             21.4
Name: mpg, dtype: float64



In [37]:

    
#or a slice of rows
cars[2:5]









    Out[37]:






  
    
      
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
    
      car
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Camaro Z28
      13.3
      8
      350
      245
      3.73
      3.840
      15.41
      0
      0
      3
      4
    
    
      Chrysler Imperial
      14.7
      8
      440
      230
      3.23
      5.345
      17.42
      0
      0
      3
      4
    
    
      Datsun 710
      22.8
      4
      108
      93
      3.85
      2.320
      18.61
      1
      1
      4
      1



In [38]:

    
#by label = primary key
cars.loc['Fiat 128':'Lotus Europa']









    Out[38]:






  
    
      
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
    
      car
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Fiat 128
      32.4
      4
      78.7
      66
      4.08
      2.200
      19.47
      1
      1
      4
      1
    
    
      Fiat X1-9
      27.3
      4
      79.0
      66
      4.08
      1.935
      18.90
      1
      1
      4
      1
    
    
      Ford Pantera L
      15.8
      8
      351.0
      264
      4.22
      3.170
      14.50
      0
      1
      5
      4
    
    
      Honda Civic
      30.4
      4
      75.7
      52
      4.93
      1.615
      18.52
      1
      1
      4
      2
    
    
      Hornet 4 Drive
      21.4
      6
      258.0
      110
      3.08
      3.215
      19.44
      1
      0
      3
      1
    
    
      Hornet Sportabout
      18.7
      8
      360.0
      175
      3.15
      3.440
      17.02
      0
      0
      3
      2
    
    
      Lincoln Continental
      10.4
      8
      460.0
      215
      3.00
      5.424
      17.82
      0
      0
      3
      4
    
    
      Lotus Europa
      30.4
      4
      95.1
      113
      3.77
      1.513
      16.90
      1
      1
      5
      2



In [39]:

    
#selection by position
cars.iloc[3]









    Out[39]:





mpg      14.700
cyl       8.000
disp    440.000
hp      230.000
drat      3.230
wt        5.345
qsec     17.420
vs        0.000
am        0.000
gear      3.000
carb      4.000
Name: Chrysler Imperial, dtype: float64



In [40]:

    
cars.iloc[3:5, 0:2]









    Out[40]:






  
    
      
      mpg
      cyl
    
    
      car
      
      
    
  
  
    
      Chrysler Imperial
      14.7
      8
    
    
      Datsun 710
      22.8
      4



In [41]:

    
cars[cars.cyl > 6] # more than 6 cylinders









    Out[41]:






  
    
      
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
    
      car
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      AMC Javelin
      15.2
      8
      304.0
      150
      3.15
      3.435
      17.30
      0
      0
      3
      2
    
    
      Cadillac Fleetwood
      10.4
      8
      472.0
      205
      2.93
      5.250
      17.98
      0
      0
      3
      4
    
    
      Camaro Z28
      13.3
      8
      350.0
      245
      3.73
      3.840
      15.41
      0
      0
      3
      4
    
    
      Chrysler Imperial
      14.7
      8
      440.0
      230
      3.23
      5.345
      17.42
      0
      0
      3
      4
    
    
      Dodge Challenger
      15.5
      8
      318.0
      150
      2.76
      3.520
      16.87
      0
      0
      3
      2
    
    
      Duster 360
      14.3
      8
      360.0
      245
      3.21
      3.570
      15.84
      0
      0
      3
      4
    
    
      Ford Pantera L
      15.8
      8
      351.0
      264
      4.22
      3.170
      14.50
      0
      1
      5
      4
    
    
      Hornet Sportabout
      18.7
      8
      360.0
      175
      3.15
      3.440
      17.02
      0
      0
      3
      2
    
    
      Lincoln Continental
      10.4
      8
      460.0
      215
      3.00
      5.424
      17.82
      0
      0
      3
      4
    
    
      Maserati Bora
      15.0
      8
      301.0
      335
      3.54
      3.570
      14.60
      0
      1
      5
      8
    
    
      Merc 450SE
      16.4
      8
      275.8
      180
      3.07
      4.070
      17.40
      0
      0
      3
      3
    
    
      Merc 450SL
      17.3
      8
      275.8
      180
      3.07
      3.730
      17.60
      0
      0
      3
      3
    
    
      Merc 450SLC
      15.2
      8
      275.8
      180
      3.07
      3.780
      18.00
      0
      0
      3
      3
    
    
      Pontiac Firebird
      19.2
      8
      400.0
      175
      3.08
      3.845
      17.05
      0
      0
      3
      2

Missing Data



In [42]:

    
cars_na = pd.read_csv('data/mtcars_with_nas.csv')



In [43]:

    
cars_na.isnull().head(4)









    Out[43]:






  
    
      
      car
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
  
  
    
      0
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
    
    
      1
      False
      False
      False
      True
      False
      False
      False
      False
      False
      False
      False
      False
    
    
      2
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
    
    
      3
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False
      False



In [44]:

    
#fill with a default value
cars_na.fillna(0).head(4)









    Out[44]:






  
    
      
      car
      mpg
      cyl
      disp
      hp
      drat
      wt
      qsec
      vs
      am
      gear
      carb
    
  
  
    
      0
      Mazda RX4
      21.0
      6
      160
      110
      3.90
      2.620
      16.46
      0
      1
      4
      4
    
    
      1
      Mazda RX4 Wag
      21.0
      6
      0
      110
      3.90
      2.875
      17.02
      0
      1
      4
      4
    
    
      2
      Datsun 710
      22.8
      4
      108
      93
      3.85
      2.320
      18.61
      1
      1
      4
      1
    
    
      3
      Hornet 4 Drive
      21.4
      6
      258
      110
      3.08
      3.215
      19.44
      1
      0
      3
      1



In [45]:

    
#or drop the rows

print(cars_na.shape)
#drop rows with na values
print(cars_na.dropna().shape)
#drop columns with na values
print(cars_na.dropna(axis=1).shape)

#see also http://pandas.pydata.org/pandas-docs/stable/missing_data.html









    



(32, 12)
(29, 12)
(32, 10)

Statistics

besides .describe() there are plenty of other staticial measures and aggregation methods in Pandas/Numpy



In [46]:

    
#stats
cars.mean()









    Out[46]:





mpg      20.090625
cyl       6.187500
disp    230.721875
hp      146.687500
drat      3.596562
wt        3.217250
qsec     17.848750
vs        0.437500
am        0.406250
gear      3.687500
carb      2.812500
dtype: float64



In [47]:

    
cars.mean(axis=1)









    Out[47]:





car
AMC Javelin            46.007727
Cadillac Fleetwood     66.232727
Camaro Z28             58.752727
Chrysler Imperial      65.972273
Datsun 710             23.598182
Dodge Challenger       47.240909
Duster 360             59.720000
Ferrari Dino           34.508182
Fiat 128               19.440909
Fiat X1-9              18.928636
Ford Pantera L         60.971818
Honda Civic            17.742273
Hornet 4 Drive         38.739545
Hornet Sportabout      53.664545
Lincoln Continental    66.058545
Lotus Europa           24.880273
Maserati Bora          63.155455
Mazda RX4              29.907273
Mazda RX4 Wag          29.981364
Merc 230               27.233636
Merc 240D              24.634545
Merc 280               31.860000
Merc 280C              31.787273
Merc 450SE             46.430909
Merc 450SL             46.500000
Merc 450SLC            46.350000
Pontiac Firebird       57.379545
Porsche 914-2          24.779091
Toyota Corolla         18.814091
Toyota Corona          24.888636
Valiant                35.049091
Volvo 142E             26.262727
dtype: float64



In [48]:

    
#grouping
cars.groupby('cyl').mean()



In [49]:

    
#grouping different aggregation methods
cars.groupby('cyl').agg({ 'mpg': 'mean', 'qsec': 'min'})

TASKS

Now there are a series of simple tasks



In [50]:

    
#loading gapminder data (taken from https://github.com/jennybc/gapminder)
# file located at 'data/gapminder-unfiltered.tsv' it uses tabular character as separator
# use the first column as index
gap = pd.read_csv('data/gapminder-unfiltered.tsv',index_col=0, sep='\t')



In [51]:

    
#what are the columns of this dataset?
gap.head()









    Out[51]:






  
    
      
      continent
      year
      lifeExp
      pop
      gdpPercap
    
    
      country
      
      
      
      
      
    
  
  
    
      Afghanistan
      Asia
      1952
      28.801
      8425333
      779.445314
    
    
      Afghanistan
      Asia
      1957
      30.332
      9240934
      820.853030
    
    
      Afghanistan
      Asia
      1962
      31.997
      10267083
      853.100710
    
    
      Afghanistan
      Asia
      1967
      34.020
      11537966
      836.197138
    
    
      Afghanistan
      Asia
      1972
      36.088
      13079460
      739.981106



In [52]:

    
#what is the maximal year contained?
gap['year'].max()









    Out[52]:





2007



In [53]:

    
#just select all data of the year 2007
gap2007 = gap[gap.year == 2007]



In [54]:

    
#locate Austria and print it
gap2007.loc['Austria']









    Out[54]:





continent         Europe
year                2007
lifeExp           79.829
pop          8.19978e+06
gdpPercap        36126.5
Name: Austria, dtype: object



In [55]:

    
#list the top 10 countries by life expectancy (lifeExp)
gap2007.sort_values('lifeExp',ascending=False).head(10)









    Out[55]:






  
    
      
      continent
      year
      lifeExp
      pop
      gdpPercap
    
    
      country
      
      
      
      
      
    
  
  
    
      Japan
      Asia
      2007
      82.603
      127467972
      31656.06806
    
    
      Hong Kong, China
      Asia
      2007
      82.208
      6980412
      39724.97867
    
    
      Iceland
      Europe
      2007
      81.757
      301931
      36180.78919
    
    
      Switzerland
      Europe
      2007
      81.701
      7554661
      37506.41907
    
    
      Australia
      Oceania
      2007
      81.235
      20434176
      34435.36744
    
    
      Spain
      Europe
      2007
      80.941
      40448191
      28821.06370
    
    
      Sweden
      Europe
      2007
      80.884
      9031088
      33859.74835
    
    
      Israel
      Asia
      2007
      80.745
      6426679
      25523.27710
    
    
      Macao, China
      Asia
      2007
      80.718
      456989
      54589.82024
    
    
      France
      Europe
      2007
      80.657
      61083916
      30470.01670



In [56]:

    
#what is the total population (pop) per continent
gap2007.groupby('continent').agg({ 'pop': 'sum'})

(Interactive) Plotting using Matplotlib and Seaborn

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
count	32.000000	32.000000	32.000000	32.000000	32.000000	32.000000	32.000000	32.000000	32.000000	32.000000	32.0000
mean	20.090625	6.187500	230.721875	146.687500	3.596563	3.217250	17.848750	0.437500	0.406250	3.687500	2.8125
std	6.026948	1.785922	123.938694	68.562868	0.534679	0.978457	1.786943	0.504016	0.498991	0.737804	1.6152
min	10.400000	4.000000	71.100000	52.000000	2.760000	1.513000	14.500000	0.000000	0.000000	3.000000	1.0000
25%	15.425000	4.000000	120.825000	96.500000	3.080000	2.581250	16.892500	0.000000	0.000000	3.000000	2.0000
50%	19.200000	6.000000	196.300000	123.000000	3.695000	3.325000	17.710000	0.000000	0.000000	4.000000	2.0000
75%	22.800000	8.000000	326.000000	180.000000	3.920000	3.610000	18.900000	1.000000	1.000000	4.000000	4.0000
max	33.900000	8.000000	472.000000	335.000000	4.930000	5.424000	22.900000	1.000000	1.000000	5.000000	8.0000

	mpg	disp	hp	drat	wt	qsec	vs	am	gear	carb
cyl
4	26.663636	105.136364	82.636364	4.070909	2.285727	19.137273	0.909091	0.727273	4.090909	1.545455
6	19.742857	183.314286	122.285714	3.585714	3.117143	17.977143	0.571429	0.428571	3.857143	3.428571
8	15.100000	353.100000	209.214286	3.229286	3.999214	16.772143	0.000000	0.142857	3.285714	3.500000

	pop
continent
Africa	929963305
Americas	901346796
Asia	3840766818
Europe	593085338
FSU	255612291
Oceania	32945296

	car	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
0	Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
1	Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
2	Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1
3	Hornet 4 Drive	21.4	6	258	110	3.08	3.215	19.44	1	0	3	1
4	Hornet Sportabout	18.7	8	360	175	3.15	3.440	17.02	0	0	3	2

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
car
AMC Javelin	15.2	8	304	150	3.15	3.435	17.30	0	0	3	2
Cadillac Fleetwood	10.4	8	472	205	2.93	5.250	17.98	0	0	3	4
Camaro Z28	13.3	8	350	245	3.73	3.840	15.41	0	0	3	4
Chrysler Imperial	14.7	8	440	230	3.23	5.345	17.42	0	0	3	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
car
Cadillac Fleetwood	10.4	8	472.0	205	2.93	5.250	17.98	0	0	3	4
Lincoln Continental	10.4	8	460.0	215	3.00	5.424	17.82	0	0	3	4
Camaro Z28	13.3	8	350.0	245	3.73	3.840	15.41	0	0	3	4
Duster 360	14.3	8	360.0	245	3.21	3.570	15.84	0	0	3	4
Chrysler Imperial	14.7	8	440.0	230	3.23	5.345	17.42	0	0	3	4
Maserati Bora	15.0	8	301.0	335	3.54	3.570	14.60	0	1	5	8
AMC Javelin	15.2	8	304.0	150	3.15	3.435	17.30	0	0	3	2
Merc 450SLC	15.2	8	275.8	180	3.07	3.780	18.00	0	0	3	3
Dodge Challenger	15.5	8	318.0	150	2.76	3.520	16.87	0	0	3	2
Ford Pantera L	15.8	8	351.0	264	4.22	3.170	14.50	0	1	5	4
Merc 450SE	16.4	8	275.8	180	3.07	4.070	17.40	0	0	3	3
Merc 450SL	17.3	8	275.8	180	3.07	3.730	17.60	0	0	3	3
Merc 280C	17.8	6	167.6	123	3.92	3.440	18.90	1	0	4	4
Valiant	18.1	6	225.0	105	2.76	3.460	20.22	1	0	3	1
Hornet Sportabout	18.7	8	360.0	175	3.15	3.440	17.02	0	0	3	2

	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
car
Fiat 128	32.4	4	78.7	66	4.08	2.200	19.47	1	1	4	1
Fiat X1-9	27.3	4	79.0	66	4.08	1.935	18.90	1	1	4	1
Ford Pantera L	15.8	8	351.0	264	4.22	3.170	14.50	0	1	5	4
Honda Civic	30.4	4	75.7	52	4.93	1.615	18.52	1	1	4	2
Hornet 4 Drive	21.4	6	258.0	110	3.08	3.215	19.44	1	0	3	1
Hornet Sportabout	18.7	8	360.0	175	3.15	3.440	17.02	0	0	3	2
Lincoln Continental	10.4	8	460.0	215	3.00	5.424	17.82	0	0	3	4
Lotus Europa	30.4	4	95.1	113	3.77	1.513	16.90	1	1	5	2

	car	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
0	False	False	False	False	False	False	False	False	False	False	False	False
1	False	False	False	True	False	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False	False	False	False	False	False
3	False	False	False	False	False	False	False	False	False	False	False	False

	continent	year	lifeExp	pop	gdpPercap
country
Afghanistan	Asia	1952	28.801	8425333	779.445314
Afghanistan	Asia	1957	30.332	9240934	820.853030
Afghanistan	Asia	1962	31.997	10267083	853.100710
Afghanistan	Asia	1967	34.020	11537966	836.197138
Afghanistan	Asia	1972	36.088	13079460	739.981106

	continent	year	lifeExp	pop	gdpPercap
country
Japan	Asia	2007	82.603	127467972	31656.06806
Hong Kong, China	Asia	2007	82.208	6980412	39724.97867
Iceland	Europe	2007	81.757	301931	36180.78919
Switzerland	Europe	2007	81.701	7554661	37506.41907
Australia	Oceania	2007	81.235	20434176	34435.36744
Spain	Europe	2007	80.941	40448191	28821.06370
Sweden	Europe	2007	80.884	9031088	33859.74835
Israel	Asia	2007	80.745	6426679	25523.27710
Macao, China	Asia	2007	80.718	456989	54589.82024
France	Europe	2007	80.657	61083916	30470.01670