Basic Data Analysis and Visualization Using Python

by Yanal Kashou

This is a free dataset of exoplanets from the RDatasets.

The data was obtained from the URLs below of the .csv data file and the .html documentation file, respectively:

It contains 30 observations on the following 3 variables.

  • Price Asking price for the car (in $1,000's)
  • Age Age of the car (in years)
  • Mileage Previous miles driven (in 1,000's)

Load the Dataset


In [4]:
import pandas as pd
porsche = pd.read_csv("PorschePrice.csv")

Explore the Dataset


In [5]:
porsche.shape


Out[5]:
(30, 4)

In [6]:
porsche.head(5)


Out[6]:
Unnamed: 0 Price Age Mileage
0 1 69.4 3 21.5
1 2 56.9 3 43.0
2 3 49.9 2 19.9
3 4 47.4 4 36.0
4 5 42.9 4 44.0

In [7]:
porsche = porsche.rename(columns = {'Unnamed: 0':'Number'})

In [8]:
porsche.head(5)


Out[8]:
Number Price Age Mileage
0 1 69.4 3 21.5
1 2 56.9 3 43.0
2 3 49.9 2 19.9
3 4 47.4 4 36.0
4 5 42.9 4 44.0

In [9]:
porsche.describe()


Out[9]:
Number Price Age Mileage
count 30.000000 30.000000 30.000000 30.000000
mean 15.500000 50.536667 6.200000 34.872333
std 8.803408 15.542211 5.868678 23.504416
min 1.000000 16.000000 0.000000 0.670000
25% 8.250000 40.650000 3.000000 19.300000
50% 15.500000 51.900000 4.000000 33.150000
75% 22.750000 58.650000 7.750000 48.375000
max 30.000000 83.000000 22.000000 89.600000

Plotting Using Seaborn and PyPlot


In [10]:
import seaborn as sns
import matplotlib.pyplot as plt

Pairplot


In [11]:
sns.pairplot(porsche[["Price", "Age", "Mileage"]])
plt.show()


Radial Visualization


In [12]:
from pandas.tools.plotting import radviz

plt.figure()
radviz(porsche, 'Age')
plt.show()


Vertical Barchart


In [13]:
plt.figure();
porsche.plot(kind = 'bar', stacked = True);
plt.show()


<matplotlib.figure.Figure at 0x16699b38cc0>

Horizontal Barchart


In [14]:
porsche.plot(kind='barh', stacked=True);
plt.show()


Histogram


In [15]:
plt.figure();
porsche['Mileage'].diff().hist(bins = 7)
plt.show()


Andrews Curves


In [30]:
from pandas.tools.plotting import andrews_curves
plt.figure()
andrews_curves(porsche, 'Age', colormap = 'autumn')
plt.show()