Assignment 1

  • Perform a basic statistical analysis of the time DOT 311 (table is called dot_311) complaints are open (subtract closed date from created date)
  • Connect to the database to get the data and do the analysis. Submit the code through Github and type up your results in your PR

Host: training.c1erymiua9dx.us-east-1.rds.amazonaws.com

Port: 5432

Database: training

Username: dot_student

Password: qgis


In [75]:
import pg8000
import pandas as pd

In [76]:
conn = pg8000.connect(host='training.c1erymiua9dx.us-east-1.rds.amazonaws.com',port=5432,database='training',user='dot_student',password='qgis')

In [77]:
conn.rollback()

In [71]:
cursor = conn.cursor()
statement="SELECT * from INFORMATION_SCHEMA.COLUMNs where table_name='dot_311'"
cursor.execute(statement)
columns=[]
for row in cursor.fetchall():
    columns.append(row[3])
print(columns)


['gid', 'unique_key', 'agency', 'agency nam', 'complaint', 'descriptor', 'location t', 'incident z', 'incident a', 'street nam', 'cross stre', 'cross st_1', 'intersecti', 'intersec_1', 'address ty', 'city', 'landmark', 'facility t', 'status', 'due date', 'resolution', 'resoluti_1', 'community', 'borough', 'x coordina', 'y coordina', 'park facil', 'park borou', 'school nam', 'school num', 'school reg', 'school cod', 'school pho', 'school add', 'school cit', 'school sta', 'school zip', 'school not', 'school or', 'vehicle ty', 'taxi compa', 'taxi pick', 'bridge hig', 'bridge h_1', 'road ramp', 'bridge h_2', 'garage lot', 'ferry dire', 'ferry term', 'latitude', 'longitude', 'location', 'geom', 'created_date', 'closed_date']

In [95]:
cursor = conn.cursor()
cursor.execute("SELECT unique_key,agency,complaint,descriptor,created_date,closed_date,borough from dot_311")
complaints=[]
for item in cursor:
    complaints.append(item)

In [145]:
df = pd.DataFrame(complaints)
df.columns = ["unique_key","agency","complaint","descriptor","created_date","closed_date","borough"]

In [146]:
df['efficiency']=df['closed_date']-df['created_date']

In [99]:
df


Out[99]:
unique_key agency complaint descriptor created_date closed_date borough efficiency
0 32570549 DOT Traffic Signal Condition Controller 2016-02-01 00:12:00 2016-02-01 01:15:00 STATEN ISLAND 0 days 01:03:00
1 32572958 DOT Traffic Signal Condition Controller 2016-02-01 00:14:00 2016-02-01 00:54:00 STATEN ISLAND 0 days 00:40:00
2 32573576 DOT Traffic Signal Condition Controller 2016-02-01 00:15:00 2016-02-01 01:30:00 STATEN ISLAND 0 days 01:15:00
3 32572389 DOT Traffic Signal Condition Controller 2016-02-01 00:17:00 2016-02-01 02:35:00 MANHATTAN 0 days 02:18:00
4 32570548 DOT Traffic Signal Condition Controller 2016-02-01 00:19:00 2016-02-01 01:00:00 MANHATTAN 0 days 00:41:00
5 32574044 DOT Traffic Signal Condition Controller 2016-02-01 00:21:00 2016-02-01 02:00:00 MANHATTAN 0 days 01:39:00
6 32573047 DOT Traffic Signal Condition Controller 2016-02-01 00:22:00 2016-02-01 01:30:00 MANHATTAN 0 days 01:08:00
7 32572971 DOT Traffic Signal Condition Controller 2016-02-01 00:27:00 2016-02-01 01:05:00 BROOKLYN 0 days 00:38:00
8 32575571 DOT Traffic Signal Condition Base Door 2016-02-01 00:32:00 2016-02-01 00:42:00 BRONX 0 days 00:10:00
9 32576008 DOT Street Condition Pothole 2016-02-01 00:32:16 2016-02-01 22:30:00 MANHATTAN 0 days 21:57:44
10 32573945 DOT Street Condition Plate Condition - Noisy 2016-02-01 00:40:25 2016-02-01 20:44:42 MANHATTAN 0 days 20:04:17
11 32570682 DOT Street Light Condition Street Light Out 2016-02-01 00:44:00 2016-02-08 12:29:00 BROOKLYN 7 days 11:45:00
12 32572906 DOT Traffic Signal Condition Post 2016-02-01 00:50:00 2016-02-10 09:10:00 BROOKLYN 9 days 08:20:00
13 32579102 DOT Traffic Signal Condition Pedestrian Signal 2016-02-01 00:53:00 2016-02-01 01:13:00 BRONX 0 days 00:20:00
14 32578446 DOT Street Light Condition Lamppost Knocked Down 2016-02-01 14:26:00 NaT BRONX NaT
15 32573686 DOT Street Light Condition Street Light Out 2016-02-01 00:57:00 2016-02-01 13:45:00 BROOKLYN 0 days 12:48:00
16 32570317 DOT Street Condition Line/Marking - Faded 2016-02-01 01:18:49 2016-02-03 16:59:00 BROOKLYN 2 days 15:40:11
17 32581124 DOT Street Light Condition Street Light Out 2016-02-01 01:20:00 2016-02-05 12:04:00 QUEENS 4 days 10:44:00
18 32579277 DOT Street Light Condition Street Light Out 2016-02-01 01:36:00 2016-02-04 10:37:00 BRONX 3 days 09:01:00
19 32580720 DOT Street Light Condition Street Light Out 2016-02-01 01:36:00 2016-02-02 13:09:00 STATEN ISLAND 1 days 11:33:00
20 32571895 DOT Street Condition Pothole 2016-02-01 01:51:36 2016-02-01 10:23:00 QUEENS 0 days 08:31:24
21 32580285 DOT Street Light Condition Street Light Out 2016-02-01 02:07:00 2016-02-08 15:25:00 QUEENS 7 days 13:18:00
22 32575419 DOT Sidewalk Condition Newspaper Box Complaint 2016-02-01 02:32:08 2016-02-04 12:06:14 BROOKLYN 3 days 09:34:06
23 32570671 DOT Street Condition Pothole 2016-02-01 02:43:01 2016-02-01 09:40:00 BROOKLYN 0 days 06:56:59
24 32581170 DOT Street Light Condition Street Light Out 2016-02-01 03:21:00 2016-02-02 14:57:00 BROOKLYN 1 days 11:36:00
25 32581434 DOT Traffic Signal Condition Controller 2016-02-01 03:24:00 2016-02-01 11:30:00 MANHATTAN 0 days 08:06:00
26 32575447 DOT Street Light Condition Street Light Out 2016-02-01 03:25:00 2016-02-01 13:44:00 BROOKLYN 0 days 10:19:00
27 32577440 DOT Street Light Condition Street Light Out 2016-02-01 03:26:00 2016-02-03 10:03:00 MANHATTAN 2 days 06:37:00
28 32577560 DOT Traffic Signal Condition LED Pedestrian Unit 2016-02-01 03:37:00 2016-02-01 04:00:00 BRONX 0 days 00:23:00
29 32575033 DOT Street Condition Pothole 2016-02-01 03:49:22 2016-02-01 09:57:00 BROOKLYN 0 days 06:07:38
... ... ... ... ... ... ... ... ...
5821 32614085 DOT Street Condition Plate Condition - Shifted 2016-02-06 22:07:24 2016-02-09 15:23:13 BRONX 2 days 17:15:49
5822 32612339 DOT Street Condition Plate Condition - Noisy 2016-02-06 22:23:07 2016-02-08 03:08:37 MANHATTAN 1 days 04:45:30
5823 32612333 DOT Street Condition Plate Condition - Noisy 2016-02-06 22:25:28 2016-02-08 03:09:49 MANHATTAN 1 days 04:44:21
5824 32614717 DOT Street Light Condition Street Light Out 2016-02-06 22:27:00 2016-02-08 11:26:00 QUEENS 1 days 12:59:00
5825 32617685 DOT Street Light Condition Street Light Out 2016-02-06 22:32:00 2016-02-09 12:07:00 BROOKLYN 2 days 13:35:00
5826 32615195 DOT Street Condition Plate Condition - Noisy 2016-02-06 22:34:14 2016-02-08 03:08:37 MANHATTAN 1 days 04:34:23
5827 32613947 DOT Highway Condition Pothole - Highway 2016-02-06 22:34:34 2016-02-08 14:39:00 BROOKLYN 1 days 16:04:26
5828 32612566 DOT Street Light Condition Lamppost Knocked Down 2016-02-06 22:35:00 2016-03-16 10:39:00 BROOKLYN 38 days 12:04:00
5829 32614725 DOT Street Light Condition Street Light Out 2016-02-06 22:41:00 2016-02-11 09:43:00 QUEENS 4 days 11:02:00
5830 32617705 DOT Street Sign - Missing Bus Stop 2016-02-06 22:42:29 2016-02-26 10:13:26 MANHATTAN 19 days 11:30:57
5831 32620216 DOT Street Condition Pothole 2016-02-06 23:35:42 2016-02-07 20:40:00 MANHATTAN 0 days 21:04:18
5832 32617828 DOT Curb Condition Defacement 2016-02-06 22:47:30 2016-02-12 15:18:14 BRONX 5 days 16:30:44
5833 32618570 DOT Street Sign - Missing Bus Stop 2016-02-06 22:49:43 2016-02-22 10:04:38 MANHATTAN 15 days 11:14:55
5834 32616052 DOT Street Condition Cave-in 2016-02-06 22:53:35 2016-02-17 09:21:23 BROOKLYN 10 days 10:27:48
5835 32612279 DOT Street Condition Plate Condition - Shifted 2016-02-06 22:55:05 2016-02-08 04:09:00 MANHATTAN 1 days 05:13:55
5836 32614241 DOT Street Condition Plate Condition - Noisy 2016-02-06 22:56:01 2016-02-08 13:32:54 MANHATTAN 1 days 14:36:53
5837 32617383 DOT Street Light Condition Street Light Out 2016-02-06 22:59:00 2016-02-08 11:27:00 QUEENS 1 days 12:28:00
5838 32618573 DOT Street Condition Pothole 2016-02-06 23:02:17 2016-02-08 23:46:00 MANHATTAN 2 days 00:43:43
5839 32617669 DOT Street Condition Pothole 2016-02-06 23:02:35 2016-02-16 09:45:00 MANHATTAN 9 days 10:42:25
5840 32612533 DOT Street Light Condition Street Light Out 2016-02-06 23:06:00 2016-02-08 21:50:00 MANHATTAN 1 days 22:44:00
5841 32618378 DOT Traffic Signal Condition Controller 2016-02-06 23:09:00 2016-02-07 03:30:00 BRONX 0 days 04:21:00
5842 32622707 DOT Street Condition Pothole 2016-02-06 23:33:49 2016-02-07 17:05:00 BRONX 0 days 17:31:11
5843 32622721 DOT Street Condition Pothole 2016-02-06 23:35:02 2016-02-07 19:40:00 MANHATTAN 0 days 20:04:58
5844 32613217 DOT Street Condition Plate Condition - Shifted 2016-02-06 23:36:15 2016-02-08 06:52:08 MANHATTAN 1 days 07:15:53
5845 32623981 DOT Street Condition Pothole 2016-02-06 23:36:20 2016-02-07 21:10:00 MANHATTAN 0 days 21:33:40
5846 32617301 DOT Street Light Condition Street Light Out 2016-02-06 23:37:00 2016-02-08 22:50:00 MANHATTAN 1 days 23:13:00
5847 32614083 DOT Street Condition Cave-in 2016-02-06 23:46:22 NaT MANHATTAN NaT
5848 32612807 DOT Street Light Condition Street Light Out 2016-02-06 23:53:00 2016-02-12 22:30:00 MANHATTAN 5 days 22:37:00
5849 32617735 DOT Bridge Condition Pothole 2016-02-06 23:54:02 2016-02-08 10:07:29 QUEENS 1 days 10:13:27
5850 32619242 DOT Street Light Condition Street Light Out 2016-02-07 00:00:00 2016-02-10 15:19:00 BRONX 3 days 15:19:00

5851 rows × 8 columns


In [104]:
%matplotlib inline

In [112]:
df['efficiency'].describe()


Out[112]:
count                       5650
mean      6 days 15:05:05.589380
std      14 days 12:05:38.260805
min           -19 days +09:29:00
25%              0 days 01:14:00
50%              0 days 21:48:15
75%       4 days 00:30:48.500000
max             89 days 18:54:00
Name: efficiency, dtype: object

In [204]:
df['eff_new']=df['closed_date']-df['created_date']

In [205]:
def get_days(t):
    try:
        return t.days
    except:
        pass

df['eff_new'] = df['eff_new'].apply(get_days)

In [206]:
df['eff_new']


Out[206]:
0        0.0
1        0.0
2        0.0
3        0.0
4        0.0
5        0.0
6        0.0
7        0.0
8        0.0
9        0.0
10       0.0
11       7.0
12       9.0
13       0.0
14       NaN
15       0.0
16       2.0
17       4.0
18       3.0
19       1.0
20       0.0
21       7.0
22       3.0
23       0.0
24       1.0
25       0.0
26       0.0
27       2.0
28       0.0
29       0.0
        ... 
5821     2.0
5822     1.0
5823     1.0
5824     1.0
5825     2.0
5826     1.0
5827     1.0
5828    38.0
5829     4.0
5830    19.0
5831     0.0
5832     5.0
5833    15.0
5834    10.0
5835     1.0
5836     1.0
5837     1.0
5838     2.0
5839     9.0
5840     1.0
5841     0.0
5842     0.0
5843     0.0
5844     1.0
5845     0.0
5846     1.0
5847     NaN
5848     5.0
5849     1.0
5850     3.0
Name: eff_new, dtype: float64

In [207]:
df[df['eff_new']>0]['eff_new'].hist(bins=20)


Out[207]:
<matplotlib.axes._subplots.AxesSubplot at 0x1169c8d68>

In [116]:
df2 = pd.DataFrame(complaints)
df2.columns = ["unique_key","agency","complaint","descriptor","created_date","closed_date","borough"]
df2['efficiency']=df2['closed_date']-df2['created_date']

In [125]:
df2.index=df2['efficiency']

In [168]:
df2.groupby(by=df2.index.days)['unique_key'].count().plot()


Out[168]:
<matplotlib.axes._subplots.AxesSubplot at 0x11623c5f8>

In [156]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5851 entries, 0 to 5850
Data columns (total 8 columns):
unique_key      5851 non-null object
agency          5851 non-null object
complaint       5851 non-null object
descriptor      5851 non-null object
created_date    5851 non-null datetime64[ns]
closed_date     5650 non-null datetime64[ns]
borough         5851 non-null object
efficiency      5650 non-null timedelta64[ns]
dtypes: datetime64[ns](2), object(5), timedelta64[ns](1)
memory usage: 365.8+ KB

In [162]:
df.groupby(by='complaint').count()['unique_key'].sort_values().plot(kind="barh")


Out[162]:
<matplotlib.axes._subplots.AxesSubplot at 0x11065ae10>

In [163]:
df.groupby(by='borough').count()['unique_key'].sort_values().plot(kind="barh")


Out[163]:
<matplotlib.axes._subplots.AxesSubplot at 0x110f12f98>

In [210]:
new_df = df[df['eff_new']>0]
new_df['eff_new'].describe()


Out[210]:
count    2607.000000
mean       14.333333
std        18.007218
min         1.000000
25%         1.000000
50%         4.000000
75%        28.000000
max        89.000000
Name: eff_new, dtype: float64

In [211]:
new_df['eff_new'].median()


Out[211]:
4.0

In [213]:
new_df['eff_new'].mode()


Out[213]:
0    1.0
dtype: float64

In [212]:
# UAL
28+27*1.5


Out[212]:
68.5

In [216]:
# outlier
new_df[new_df['eff_new']>68.5].count()


Out[216]:
unique_key      33
agency          33
complaint       33
descriptor      33
created_date    33
closed_date     33
borough         33
efficiency      33
eff_new         33
dtype: int64

Conclusion

  • In terms of borough, Queens has the most complaints, the second and the third is Brooklyn and Manhattan.
  • Street light condition and stree condition is the most common reason for a complaint, much more than other reasons.
  • If we don't consider the "null" data and bad data, the average time of close time is 14 days, with the min of same day closing and max of 89 days. Median closing day equals to 4. “Middle” 50% of data is in range of same day to 28 days. the standard deviation is 18, which seems to be high. There are 33 outliers.

In [ ]:


In [ ]: