Your objective is to build a Bubble Plot that showcases the relationship between four key variables: Average Fare ($) Per City Total Number of Rides Per City Total Number of Drivers Per City City Type (Urban, Suburban, Rural) In addition, you will be expected to produce the following three pie charts: % of Total Fares by City Type % of Total Rides by City Type % of Total Drivers by City Type As final considerations: You must use the Pandas Library and the Jupyter Notebook. You must use the Matplotlib and Seaborn libraries. You must include a written description of three observable trends based on the data. You must use proper labeling of your plots, including aspects like: Plot Titles, Axes Labels, Legend Labels, Wedge Percentages, and Wedge Labels. Remember when making your plots to consider aesthetics! You must stick to the Pyber color scheme (Gold, Light Sky Blue, and Light Coral) in producing your plot and pie charts. When making your Bubble Plot, experiment with effects like alpha, edgecolor, and linewidths. When making your Pie Chart, experiment with effects like shadow, startangle, and explosion. You must include an exported markdown version of your Notebook called README.md in your GitHub repository. See Example Solution for a reference on expected format.

In [2]:
# Dependencies

from matplotlib import pyplot as plt
from scipy import stats
import numpy as np
import pandas as pd
#import plotly.plotly as py

In [3]:
#Loading data 
city_data = pd.read_csv("city_data.csv")
ride_data = pd.read_csv("ride_data.csv")

In [4]:
#Understanding the data
#city_data.head()

In [5]:
#Identifying the columns 

#ride_data.shape 
#ride_data.columns
#Index(['city', 'date', 'fare', 'ride_id'], dtype='object')

#city_data.columns
#Index(['city', 'driver_count', 'type'], dtype='object')

In [6]:
data_grouped_mean = ride_data.groupby(["city"]).mean()
#data_grouped_mean.type()
#data_grouped_mean = ride_data["city"].mean
#data_grouped_mean = ride_data.drop_duplicates("city")
#data_grouped_mean.head()
#len(data_grouped_mean)
#125

In [7]:
data_grouped_total_rides = ride_data.groupby(["city"]).count()
#data_grouped_total_rides.head()

In [8]:
total_drivers = city_data.groupby(["city"]).sum()
#total_drivers.head()

In [9]:
data_grouped_total_rides = ride_data.groupby(["city"]).count()

Grouping

(Using .pivot_table)


In [17]:
grouping_ride = ride_data.pivot_table(
           index = ["city"],
           values = ["fare", "ride_id"],
           aggfunc = {"fare": np.mean, 
                    "ride_id":len},
           fill_value = 0)
#grouping_ride

In [13]:
grouping_city = city_data.pivot_table(
           index = ["city", "type"],
           values = ["driver_count"],
           aggfunc = {"driver_count": np.mean, 
                    },
           fill_value = 0)
#grouping_city
#Result it is equal to city,and type index; driver count as column

In [18]:
grouping_city.reset_index( inplace=True)
grouping_ride.reset_index( inplace=True)
Ride_Sharing = pd.merge(grouping_ride, grouping_city, 'left', on = ["city"])
Ride_Sharing
#city	fare	ride_id	type	driver_count


Out[18]:
city fare ride_id type driver_count
0 Alvarezhaven 23.928710 31 Urban 21
1 Alyssaberg 20.609615 26 Urban 67
2 Anitamouth 37.315556 9 Suburban 16
3 Antoniomouth 23.625000 22 Urban 21
4 Aprilchester 21.981579 19 Urban 49
5 Arnoldview 25.106452 31 Urban 41
6 Campbellport 33.711333 15 Suburban 26
7 Carrollbury 36.606000 10 Suburban 4
8 Carrollfort 25.395517 29 Urban 55
9 Clarkstad 31.051667 12 Suburban 21
10 Conwaymouth 34.591818 11 Suburban 18
11 Davidtown 22.978095 21 Urban 73
12 Davistown 21.497200 25 Urban 25
13 East Cherylfurt 31.416154 13 Suburban 9
14 East Douglas 26.169091 22 Urban 12
15 East Erin 24.478214 28 Urban 43
16 East Jenniferchester 32.599474 19 Suburban 22
17 East Leslie 33.660909 11 Rural 9
18 East Stephen 39.053000 10 Rural 6
19 East Troybury 33.244286 7 Rural 3
20 Edwardsbury 26.876667 27 Urban 11
21 Erikport 30.043750 8 Rural 3
22 Eriktown 25.478947 19 Urban 15
23 Floresberg 32.310000 10 Suburban 7
24 Fosterside 23.034583 24 Urban 69
25 Hernandezshire 32.002222 9 Rural 10
26 Horneland 21.482500 4 Rural 8
27 Jacksonfort 32.006667 6 Rural 6
28 Jacobfort 24.779355 31 Urban 52
29 Jasonfort 27.831667 12 Suburban 25
... ... ... ... ... ...
95 South Roy 26.031364 22 Urban 35
96 South Shannonborough 26.516667 15 Suburban 9
97 Spencertown 23.681154 26 Urban 68
98 Stevensport 31.948000 5 Rural 6
99 Stewartview 21.614000 30 Urban 49
100 Swansonbury 27.464706 34 Urban 64
101 Thomastown 30.308333 24 Suburban 1
102 Tiffanyton 28.510000 13 Suburban 21
103 Torresshire 24.207308 26 Urban 70
104 Travisville 27.220870 23 Urban 37
105 Vickimouth 21.474667 15 Urban 13
106 Webstertown 29.721250 16 Suburban 26
107 West Alexis 19.523000 20 Urban 47
108 West Brandy 24.157667 30 Urban 12
109 West Brittanyton 25.436250 24 Urban 9
110 West Dawnfurt 22.330345 29 Urban 34
111 West Evan 27.013333 12 Suburban 4
112 West Jefferyfurt 21.072857 21 Urban 65
113 West Kevintown 21.528571 7 Rural 5
114 West Oscar 24.280000 29 Urban 11
115 West Pamelaborough 33.799286 14 Suburban 27
116 West Paulport 33.278235 17 Suburban 5
117 West Peter 24.875484 31 Urban 61
118 West Sydneyhaven 22.368333 18 Urban 70
119 West Tony 29.609474 19 Suburban 17
120 Williamchester 34.278182 11 Suburban 26
121 Williamshire 26.990323 31 Urban 70
122 Wiseborough 22.676842 19 Urban 55
123 Yolandafurt 27.205500 20 Urban 7
124 Zimmermanmouth 28.301667 24 Urban 45

125 rows × 5 columns


In [ ]:
#help(pd.set_option)

Bubble Plot -

(.plot / matplotlib)


In [ ]:
#help(plt.plot)

In [19]:
#Another way to do the same plot 
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(Ride_Sharing['driver_count'],Ride_Sharing['fare'], s=Ride_Sharing['ride_id']) # Added third variable as size of the bubble
plt.show()



In [20]:
Ride_Sharing.plot(kind="scatter", 
                  x = 'driver_count', 
                  y = 'fare',
                  s=Ride_Sharing['ride_id'])


Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fdc5f05978>

Bubble Plot -

(seaborn)


In [21]:
# Dependencies
import seaborn as sns; sns.set(color_codes=True)
%pylab notebook


Populating the interactive namespace from numpy and matplotlib

In [ ]:
#Ride_Sharing["ride_id"]
#Name: ride_id, Length: 125, dtype: int64

In [ ]:
#size = 100 * len((Ride_Sharing["ride_id"]) - Ride_Sharing["ride_id"].min()) / (Ride_Sharing["ride_id"].max() - Ride_Sharing["ride_id"].min())

In [22]:
g = sns.lmplot(x="driver_count", 
               y="fare", 
               hue="type", 
               fit_reg=False,
               data=Ride_Sharing,
               size= 5)


Option 1 - didn't work size= Ride_Sharing['ride_id'] setting an array element with a sequence Ride_Sharing[type].value_counts() __array__() missing 1 required positional argument: 'self'

pie charts


In [23]:
#% of Total Fares by City Type
faresBycity = Ride_Sharing.pivot_table(
           index = ["type"],
           values = ["fare"],
           aggfunc = {"fare": sum}, 
           #margins = True  ,
           #margins_name= "Total",
           fill_value = 0)
faresBycity


Out[23]:
fare
type
Rural 615.728572
Suburban 1268.627391
Urban 1623.863390

In [30]:
faresBycity.plot(kind="pie",
                  autopct='%1.1f%%',
                  startangle=90,
                  fontsize=17,
                  y = "fare",
                  explode = (0, 0, 0))        
# Tells matplotlib that we want a pie chart with equal axes
plt.axis("equal")
plt.savefig("PyPiesTotal.png")


Out[30]:
<matplotlib.axes._subplots.AxesSubplot at 0x1fdc8126b00>

In [33]:
#% Total Number of Drivers Per City
DriversbyCity = Ride_Sharing.pivot_table(
           index = ["type"],
           values = ["driver_count"],
           aggfunc = {"driver_count": sum}, 
           #margins = True  ,
           #margins_name= "Total",
           fill_value = 0)
DriversbyCity


Out[33]:
driver_count
type
Rural 104
Suburban 629
Urban 2607

In [36]:
DriversbyCity.plot(kind="pie",
                  autopct='%1.1f%%',
                  startangle=90,
                  fontsize=17,
                  y = "driver_count",
                  explode = (0, 0, 0))        
# Tells matplotlib that we want a pie chart with equal axes
plt.axis("equal")
plt.savefig("PyPiesdriversTotal.png")


% of Total Rides by City Type


In [38]:
RidesbyCity = Ride_Sharing.pivot_table(
           index = ["type"],
           values = ["ride_id"],
           aggfunc = {"ride_id": sum}, 
           #margins = True  ,
           #margins_name= "Total",
           fill_value = 0)
RidesbyCity


Out[38]:
ride_id
type
Rural 125
Suburban 625
Urban 1625

In [39]:
RidesbyCity.plot(kind="pie",
                  autopct='%1.1f%%',
                  startangle=90,
                  fontsize=17,
                  y = "ride_id",
                  explode = (0, 0, 0))        
# Tells matplotlib that we want a pie chart with equal axes
plt.axis("equal")
plt.savefig("PyPiesRidesTotal.png")



In [26]:


In [ ]: