Background on Python, iPython, Jupyter and Pandas

Python is a high-level general purpose programming language named after a British comedy troup, created by a Dutch programmer as a hobby project and maintained by an international group of friendly but opinionated python enthusiasts (import this!). Until June 2018, Guido van Rossum was the Benevolent dictator for life for the Python language, now decisions are made jointly by the Python Steering Council.

Python is popular for data science because it's powerful, fast, plays well with others, runs everywhere, is easy to learn, highly readable, and open. Because it's general purpose it can be used for full-stack development. It's got a growing list of useful libraries for scientitic programming, data manipulation, data analysis. (Numpy, Scipy, Pandas, Scikit-Learn, Statsmodels, Matplotlib, Pybrain, etc.)

iPython is an enhanced, interactive python interpreter started as a grad school project by Fernando Perez. iPython (jupyter) notebooks allow you to run a multi-language (Python, R, Julia, Markdown, LaTex, etc) interpreter in your browser to create rich, portable, and sharable code documents.

Pandas is a libary created by Wes McKinney that introduces the R-like dataframe object to Python and makes working with data in Python a lot easier. It's also a lot more efficient than the R dataframe and pretty much makes Python superior to R in every imaginable way (except for ggplot 2).

Getting started with Jupyter (iPython) Notebooks

To start up a Jupyter notebook server, simply navigate to the directory where you want the notebooks to be saved and run the command

jupyter notebook

A browser should open with a notebook navigator. Click the "New" button and select "Python 3".

A beautiful blank notebook should open in a new tab

Name the notebook by clicking on "Untitled" at the top of the page.

Notebooks are squences of cells. Cells can be markdown, code, or raw text. Change the first cell to markdown and briefly describe what you are going to do in the notebook.

Getting started with Pandas

We start by importing the libraries we're going to use: pandas and matplotlib


In [222]:
# Import Statements
import pandas as pd
import numpy as np
%matplotlib inline

In [223]:
crimes = pd.read_csv('chicago_past_year_crimes.csv')

Loading data into a Pandas DataFrame

So far we've been working with raw text files. That's one way to store and interact with data, but there are only a limited set of functions that can take as input raw text. Python has an amazing array of of data structures to work with that give you a lot of extra power in working with data.

Built-in Data Structures

  • strings ""
  • lists []
  • tuples ()
  • sets {}
  • dictionaries {'key':value}

Additional Essential Data Structures

  • numpy arrys ([])
  • pandas Series
  • pandas DataFrame
  • tensorflow Tensors

Today we'll primarily be working with the pandas DataFrame. The pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes. It's basically a spreadsheet you can program and it's an incredibly useful Python object for data analysis.

You can load data into a dataframe using Pandas' excellent read_* functions.

We're going to try two of them: read_table & read_csv

Pro tip: TAB COMPLETION!

Pro tip: jupyter will pull the doc string for a command just by asking it a question.

Pro tip: jupyter will give you the allowable arguments if you hit shift + tab

Viewing data in pandas

There are lots of options for viewing data in pandas. Just like we did in the command line, you can use head and tail to get a quick view of our data.


In [224]:
crimes.head()


Out[224]:
CASE# DATE OF OCCURRENCE BLOCK IUCR PRIMARY DESCRIPTION SECONDARY DESCRIPTION LOCATION DESCRIPTION ARREST DOMESTIC BEAT WARD FBI CD X COORDINATE Y COORDINATE LATITUDE LONGITUDE LOCATION
0 JB241987 04/28/2018 10:05:00 PM 009XX N LONG AVE 2092 NARCOTICS SOLICIT NARCOTICS ON PUBLICWAY SIDEWALK Y N 1524 37.0 18 1140136.0 1905903.0 41.897895 -87.760744 (41.897894893, -87.760743714)
1 JA430240 09/06/2017 01:30:00 PM 032XX W 26TH ST 0810 THEFT OVER $500 OTHER Y N 1024 12.0 06 1155313.0 1886555.0 41.844510 -87.705519 (41.844510467, -87.705519454)
2 JB241350 04/28/2018 08:00:00 AM 008XX E 53RD ST 1320 CRIMINAL DAMAGE TO VEHICLE STREET N N 233 5.0 14 1182892.0 1870055.0 41.798635 -87.604823 (41.798635468, -87.604823241)
3 JB245397 04/28/2018 09:00:00 AM 062XX S MICHIGAN AVE 0820 THEFT $500 AND UNDER RESIDENCE PORCH/HALLWAY N N 311 20.0 06 1178263.0 1863570.0 41.780946 -87.621995 (41.780946398, -87.621995369)
4 JB241444 04/28/2018 12:15:00 PM 046XX N ELSTON AVE 0890 THEFT FROM BUILDING SMALL RETAIL STORE N N 1722 39.0 06 1146646.0 1930549.0 41.965404 -87.736202 (41.965404069, -87.736202402)

In [225]:
crimes.tail()


Out[225]:
CASE# DATE OF OCCURRENCE BLOCK IUCR PRIMARY DESCRIPTION SECONDARY DESCRIPTION LOCATION DESCRIPTION ARREST DOMESTIC BEAT WARD FBI CD X COORDINATE Y COORDINATE LATITUDE LONGITUDE LOCATION
261959 JA385552 08/10/2017 10:30:00 AM 048XX W IOWA ST 1310 CRIMINAL DAMAGE TO PROPERTY RESIDENCE N N 1531 37.0 14 1143912.0 1905538.0 41.896823 -87.746884 (41.8968233, -87.74688383)
261960 JA492228 10/30/2017 04:00:00 PM 072XX S SOUTH SHORE DR 0810 THEFT OVER $500 APARTMENT N Y 334 7.0 06 1194878.0 1857803.0 41.764728 -87.561272 (41.764728045, -87.561272312)
261961 JA390514 08/13/2017 08:00:00 PM 039XX W VAN BUREN ST 0910 MOTOR VEHICLE THEFT AUTOMOBILE STREET N N 1132 24.0 07 1150218.0 1897735.0 41.875290 -87.723926 (41.875290412, -87.723926225)
261962 JA385930 08/10/2017 04:00:00 AM 071XX S WOLCOTT AVE 1305 CRIMINAL DAMAGE CRIMINAL DEFACEMENT RESIDENCE N N 735 17.0 14 1164921.0 1857378.0 41.764247 -87.671085 (41.764247182, -87.671084615)
261963 JA536019 12/02/2017 08:00:00 PM 083XX S INDIANA AVE 1320 CRIMINAL DAMAGE TO VEHICLE STREET N N 632 6.0 14 1179046.0 1849699.0 41.742865 -87.619547 (41.742865052, -87.619546732)

In [226]:
crimes.shape


Out[226]:
(261964, 17)

In [227]:
crimes.dtypes


Out[227]:
CASE#                      object
DATE  OF OCCURRENCE        object
BLOCK                      object
 IUCR                      object
 PRIMARY DESCRIPTION       object
 SECONDARY DESCRIPTION     object
 LOCATION DESCRIPTION      object
ARREST                     object
DOMESTIC                   object
BEAT                        int64
WARD                      float64
FBI CD                     object
X COORDINATE              float64
Y COORDINATE              float64
LATITUDE                  float64
LONGITUDE                 float64
LOCATION                   object
dtype: object

Pro tip: you'll notice that some commands have looked like pd.something(), some like data.something(), and some like data.something without (). The difference is a pandas function or class vs methods vs attributes. Methods are actions you take on a dataframe or series, while attributes are descriptors or the dataframe or series.

Modifying your dataframe

Notice that we have some issues when it comes to our column names. Let's fix that by learning how to edit and delete columns.


In [228]:
crimes.columns


Out[228]:
Index(['CASE#', 'DATE  OF OCCURRENCE', 'BLOCK', ' IUCR',
       ' PRIMARY DESCRIPTION', ' SECONDARY DESCRIPTION',
       ' LOCATION DESCRIPTION', 'ARREST', 'DOMESTIC', 'BEAT', 'WARD', 'FBI CD',
       'X COORDINATE', 'Y COORDINATE', 'LATITUDE', 'LONGITUDE', 'LOCATION'],
      dtype='object')

Notice that some of the column names have spaces at the start or end of the name. Let's remove those so that


In [229]:
# remove white spaces
crimes.columns = crimes.columns.str.strip()
crimes.columns


Out[229]:
Index(['CASE#', 'DATE  OF OCCURRENCE', 'BLOCK', 'IUCR', 'PRIMARY DESCRIPTION',
       'SECONDARY DESCRIPTION', 'LOCATION DESCRIPTION', 'ARREST', 'DOMESTIC',
       'BEAT', 'WARD', 'FBI CD', 'X COORDINATE', 'Y COORDINATE', 'LATITUDE',
       'LONGITUDE', 'LOCATION'],
      dtype='object')

In [230]:
# replacing spaces with underscore
crimes.columns = crimes.columns.str.replace(' ', '_')
crimes.columns


Out[230]:
Index(['CASE#', 'DATE__OF_OCCURRENCE', 'BLOCK', 'IUCR', 'PRIMARY_DESCRIPTION',
       'SECONDARY_DESCRIPTION', 'LOCATION_DESCRIPTION', 'ARREST', 'DOMESTIC',
       'BEAT', 'WARD', 'FBI_CD', 'X_COORDINATE', 'Y_COORDINATE', 'LATITUDE',
       'LONGITUDE', 'LOCATION'],
      dtype='object')

In [231]:
# We'll also remove the double "_" in DATE__OF_OCCURENCE
crimes.columns = crimes.columns.str.replace('__', '_')
crimes.columns


Out[231]:
Index(['CASE#', 'DATE_OF_OCCURRENCE', 'BLOCK', 'IUCR', 'PRIMARY_DESCRIPTION',
       'SECONDARY_DESCRIPTION', 'LOCATION_DESCRIPTION', 'ARREST', 'DOMESTIC',
       'BEAT', 'WARD', 'FBI_CD', 'X_COORDINATE', 'Y_COORDINATE', 'LATITUDE',
       'LONGITUDE', 'LOCATION'],
      dtype='object')

The LOCATION Column seems redundant, seeing that we also have X_COORDINATE and Y_COORDINATE columns. Let's drop it.


In [232]:
crimes.drop('LOCATION', axis=1, inplace=True)

In [233]:
crimes.columns


Out[233]:
Index(['CASE#', 'DATE_OF_OCCURRENCE', 'BLOCK', 'IUCR', 'PRIMARY_DESCRIPTION',
       'SECONDARY_DESCRIPTION', 'LOCATION_DESCRIPTION', 'ARREST', 'DOMESTIC',
       'BEAT', 'WARD', 'FBI_CD', 'X_COORDINATE', 'Y_COORDINATE', 'LATITUDE',
       'LONGITUDE'],
      dtype='object')

Describing the entire dataframe

Now that we have columns, we want to get a better global view of our data. There are several ways


In [234]:
crimes.describe()


Out[234]:
BEAT WARD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
count 261964.000000 261962.000000 2.598360e+05 2.598360e+05 259836.000000 259836.000000
mean 1144.641558 23.403978 1.165019e+06 1.886737e+06 41.844793 -87.669937
std 701.490463 14.185806 1.609511e+04 3.125165e+04 0.085941 0.058598
min 111.000000 1.000000 1.092706e+06 1.813909e+06 41.644608 -87.934273
25% 611.000000 10.000000 1.153404e+06 1.859536e+06 41.770021 -87.712091
50% 1031.000000 24.000000 1.166852e+06 1.894302e+06 41.865716 -87.663540
75% 1723.000000 35.000000 1.176556e+06 1.909030e+06 41.906247 -87.627706
max 2535.000000 50.000000 1.205119e+06 1.951535e+06 42.022671 -87.524529

In [235]:
crimes.describe(include=['O'])


Out[235]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC FBI_CD
count 261964 261964 261964 261964 261964 261964 261198 261964 261964 261964
unique 261931 117272 27712 328 32 308 133 2 2 26
top JA432970 01/01/2018 12:01:00 AM 001XX N STATE ST 0820 THEFT SIMPLE STREET N N 06
freq 4 66 981 24343 63821 29268 58779 211660 219693 63821

In [236]:
crimes.isnull().sum()


Out[236]:
CASE#                       0
DATE_OF_OCCURRENCE          0
BLOCK                       0
IUCR                        0
PRIMARY_DESCRIPTION         0
SECONDARY_DESCRIPTION       0
LOCATION_DESCRIPTION      766
ARREST                      0
DOMESTIC                    0
BEAT                        0
WARD                        2
FBI_CD                      0
X_COORDINATE             2128
Y_COORDINATE             2128
LATITUDE                 2128
LONGITUDE                2128
dtype: int64

Selecting and subsetting in pandas

One of the biggest benefits of having a multi-index object like a DataFrame is the ability to easily select rows, columns, and subsets of the data. Let's learn how to do that.

First we will select individual series from the dataframe.


In [237]:
crimes['PRIMARY_DESCRIPTION'].head()


Out[237]:
0          NARCOTICS
1              THEFT
2    CRIMINAL DAMAGE
3              THEFT
4              THEFT
Name: PRIMARY_DESCRIPTION, dtype: object

In [238]:
#using . notation
crimes.PRIMARY_DESCRIPTION.head()


Out[238]:
0          NARCOTICS
1              THEFT
2    CRIMINAL DAMAGE
3              THEFT
4              THEFT
Name: PRIMARY_DESCRIPTION, dtype: object

In [239]:
# get value counts
crimes.PRIMARY_DESCRIPTION.value_counts()


Out[239]:
THEFT                                63821
BATTERY                              48917
CRIMINAL DAMAGE                      28085
ASSAULT                              19580
DECEPTIVE PRACTICE                   17672
OTHER OFFENSE                        16407
BURGLARY                             12104
NARCOTICS                            11448
ROBBERY                              11292
MOTOR VEHICLE THEFT                  10815
CRIMINAL TRESPASS                     6881
WEAPONS VIOLATION                     4867
OFFENSE INVOLVING CHILDREN            2222
CRIM SEXUAL ASSAULT                   1515
PUBLIC PEACE VIOLATION                1427
INTERFERENCE WITH PUBLIC OFFICER      1151
SEX OFFENSE                            970
PROSTITUTION                           649
HOMICIDE                               627
ARSON                                  382
LIQUOR LAW VIOLATION                   207
KIDNAPPING                             191
GAMBLING                               180
STALKING                               168
INTIMIDATION                           138
OBSCENITY                               91
CONCEALED CARRY LICENSE VIOLATION       83
NON-CRIMINAL                            45
PUBLIC INDECENCY                        10
OTHER NARCOTIC VIOLATION                10
HUMAN TRAFFICKING                        8
NON-CRIMINAL (SUBJECT SPECIFIED)         1
Name: PRIMARY_DESCRIPTION, dtype: int64

In [240]:
# selecting two columns
crimes[['PRIMARY_DESCRIPTION', 'SECONDARY_DESCRIPTION']].head()


Out[240]:
PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION
0 NARCOTICS SOLICIT NARCOTICS ON PUBLICWAY
1 THEFT OVER $500
2 CRIMINAL DAMAGE TO VEHICLE
3 THEFT $500 AND UNDER
4 THEFT FROM BUILDING

In [241]:
#subset by row index
crimes.PRIMARY_DESCRIPTION[3:10]


Out[241]:
3                  THEFT
4                  THEFT
5                  ARSON
6    MOTOR VEHICLE THEFT
7                ROBBERY
8                BATTERY
9                ROBBERY
Name: PRIMARY_DESCRIPTION, dtype: object

In [242]:
#Use the iloc method 
crimes.iloc[10:20,4:6]


Out[242]:
PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION
10 THEFT FROM BUILDING
11 OTHER OFFENSE TELEPHONE THREAT
12 THEFT FROM BUILDING
13 BURGLARY ATTEMPT FORCIBLE ENTRY
14 BATTERY DOMESTIC BATTERY SIMPLE
15 OFFENSE INVOLVING CHILDREN ENDANGERING LIFE/HEALTH CHILD
16 BATTERY SIMPLE
17 OTHER OFFENSE VIOLATE ORDER OF PROTECTION
18 THEFT $500 AND UNDER
19 BATTERY DOMESTIC BATTERY SIMPLE

Now let's subset on row values.


In [243]:
#Create a boolean series based on a condition
theft_bool = crimes['PRIMARY_DESCRIPTION']=='THEFT'
theft_bool


Out[243]:
0         False
1          True
2         False
3          True
4          True
5         False
6         False
7         False
8         False
9         False
10         True
11        False
12         True
13        False
14        False
15        False
16        False
17        False
18         True
19        False
20        False
21         True
22        False
23        False
24        False
25        False
26        False
27        False
28        False
29        False
          ...  
261934     True
261935    False
261936    False
261937    False
261938    False
261939    False
261940     True
261941    False
261942    False
261943     True
261944    False
261945    False
261946    False
261947     True
261948    False
261949     True
261950     True
261951     True
261952     True
261953    False
261954    False
261955    False
261956    False
261957    False
261958    False
261959    False
261960     True
261961    False
261962    False
261963    False
Name: PRIMARY_DESCRIPTION, Length: 261964, dtype: bool

In [244]:
#now pass that series to the datafram to subset it
theft = crimes[theft_bool]
theft.head()


Out[244]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
1 JA430240 09/06/2017 01:30:00 PM 032XX W 26TH ST 0810 THEFT OVER $500 OTHER Y N 1024 12.0 06 1155313.0 1886555.0 41.844510 -87.705519
3 JB245397 04/28/2018 09:00:00 AM 062XX S MICHIGAN AVE 0820 THEFT $500 AND UNDER RESIDENCE PORCH/HALLWAY N N 311 20.0 06 1178263.0 1863570.0 41.780946 -87.621995
4 JB241444 04/28/2018 12:15:00 PM 046XX N ELSTON AVE 0890 THEFT FROM BUILDING SMALL RETAIL STORE N N 1722 39.0 06 1146646.0 1930549.0 41.965404 -87.736202
10 JB241656 04/28/2018 02:30:00 PM 012XX S LAKE SHORE DR E 0890 THEFT FROM BUILDING OTHER N N 132 2.0 06 1178818.0 1894964.0 41.867081 -87.619004
12 JB243959 04/28/2018 08:00:00 PM 068XX S ASHLAND AVE 0890 THEFT FROM BUILDING SMALL RETAIL STORE N N 725 17.0 06 1166850.0 1859459.0 41.769917 -87.663955

In [245]:
#now pass that series to the datafram to subset it
theft = crimes[crimes['PRIMARY_DESCRIPTION']=='THEFT']
theft.head()


Out[245]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
1 JA430240 09/06/2017 01:30:00 PM 032XX W 26TH ST 0810 THEFT OVER $500 OTHER Y N 1024 12.0 06 1155313.0 1886555.0 41.844510 -87.705519
3 JB245397 04/28/2018 09:00:00 AM 062XX S MICHIGAN AVE 0820 THEFT $500 AND UNDER RESIDENCE PORCH/HALLWAY N N 311 20.0 06 1178263.0 1863570.0 41.780946 -87.621995
4 JB241444 04/28/2018 12:15:00 PM 046XX N ELSTON AVE 0890 THEFT FROM BUILDING SMALL RETAIL STORE N N 1722 39.0 06 1146646.0 1930549.0 41.965404 -87.736202
10 JB241656 04/28/2018 02:30:00 PM 012XX S LAKE SHORE DR E 0890 THEFT FROM BUILDING OTHER N N 132 2.0 06 1178818.0 1894964.0 41.867081 -87.619004
12 JB243959 04/28/2018 08:00:00 PM 068XX S ASHLAND AVE 0890 THEFT FROM BUILDING SMALL RETAIL STORE N N 725 17.0 06 1166850.0 1859459.0 41.769917 -87.663955

In [274]:
crimes[(crimes['PRIMARY_DESCRIPTION']=='CRIMINAL DAMAGE')].head()


Out[274]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
2 JB241350 04/28/2018 08:00:00 AM 008XX E 53RD ST 1320 CRIMINAL DAMAGE TO VEHICLE STREET N N 233 5.0 14 1182892.0 1870055.0 41.798635 -87.604823
25 JB241867 04/28/2018 05:33:00 PM 002XX W 118TH ST 1310 CRIMINAL DAMAGE TO PROPERTY RESIDENCE N Y 522 34.0 14 1176607.0 1826687.0 41.679772 -87.629173
31 JA409885 08/28/2017 05:15:00 PM 017XX W LUNT AVE 1310 CRIMINAL DAMAGE TO PROPERTY OTHER N N 2423 49.0 14 1163708.0 1946562.0 42.009001 -87.673015
34 JB241890 04/28/2018 06:00:00 AM 021XX W 73RD ST 1310 CRIMINAL DAMAGE TO PROPERTY RESIDENCE N Y 735 17.0 14 1163475.0 1856233.0 41.761136 -87.676417
38 JB241787 04/28/2018 06:10:00 PM 071XX S ABERDEEN ST 1310 CRIMINAL DAMAGE TO PROPERTY RESIDENCE N N 733 17.0 14 1170216.0 1857494.0 41.764452 -87.651674

In [275]:
crimes[(crimes['PRIMARY_DESCRIPTION']=='CRIMINAL DAMAGE')&(crimes['SECONDARY_DESCRIPTION']=='TO PROPERTY')].head()


Out[275]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
25 JB241867 04/28/2018 05:33:00 PM 002XX W 118TH ST 1310 CRIMINAL DAMAGE TO PROPERTY RESIDENCE N Y 522 34.0 14 1176607.0 1826687.0 41.679772 -87.629173
31 JA409885 08/28/2017 05:15:00 PM 017XX W LUNT AVE 1310 CRIMINAL DAMAGE TO PROPERTY OTHER N N 2423 49.0 14 1163708.0 1946562.0 42.009001 -87.673015
34 JB241890 04/28/2018 06:00:00 AM 021XX W 73RD ST 1310 CRIMINAL DAMAGE TO PROPERTY RESIDENCE N Y 735 17.0 14 1163475.0 1856233.0 41.761136 -87.676417
38 JB241787 04/28/2018 06:10:00 PM 071XX S ABERDEEN ST 1310 CRIMINAL DAMAGE TO PROPERTY RESIDENCE N N 733 17.0 14 1170216.0 1857494.0 41.764452 -87.651674
48 JB242547 04/28/2018 09:00:00 PM 009XX W 70TH ST 1310 CRIMINAL DAMAGE TO PROPERTY APARTMENT N N 733 17.0 14 1171325.0 1858422.0 41.766974 -87.647582

Sorting


In [246]:
theft.sort_values('DATE_OF_OCCURRENCE', inplace=True, ascending=False)
theft.head()


/Users/mollie/.pyenv/versions/3.6.0/lib/python3.6/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
Out[246]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
135370 JA568906 12/31/2017 12:53:00 PM 047XX S WOODLAWN AVE 0810 THEFT OVER $500 STREET N N 222 4.0 06 1185037.0 1874041.0 41.809523 -87.596832
148974 JA568921 12/31/2017 12:45:00 PM 030XX N BROADWAY 0860 THEFT RETAIL THEFT GROCERY FOOD STORE Y N 1934 44.0 06 1171721.0 1920469.0 41.937228 -87.644305
156577 JA568953 12/31/2017 12:45:00 PM 001XX N STATE ST 0860 THEFT RETAIL THEFT DEPARTMENT STORE Y N 111 42.0 06 1176352.0 1900927.0 41.883500 -87.627877
208009 JB102939 12/31/2017 12:30:00 PM 083XX S CICERO AVE 0820 THEFT $500 AND UNDER PARKING LOT/GARAGE(NON.RESID.) N N 834 18.0 06 1145884.0 1848805.0 41.741102 -87.741077
216640 JA568565 12/31/2017 12:15:00 AM 023XX N MILWAUKEE AVE 0870 THEFT POCKET-PICKING BAR OR TAVERN N N 1414 35.0 06 1156766.0 1915591.0 41.924159 -87.699400

In [247]:
theft.sort_values('DATE_OF_OCCURRENCE', inplace=True, ascending=True)
theft.head()


/Users/mollie/.pyenv/versions/3.6.0/lib/python3.6/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
Out[247]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
234598 JB101036 01/01/2018 01:00:00 AM 003XX N WABASH AVE 0890 THEFT FROM BUILDING HOTEL/MOTEL N N 1834 42.0 06 1176556.0 1902666.0 41.888267 -87.627075
240910 JB100870 01/01/2018 01:00:00 AM 004XX N CLARK ST 0890 THEFT FROM BUILDING RESTAURANT N N 1831 42.0 06 1175463.0 1903198.0 41.889752 -87.631073
143954 JB101281 01/01/2018 01:00:00 AM 004XX N STATE ST 0870 THEFT POCKET-PICKING BAR OR TAVERN N N 1831 42.0 06 1176292.0 1903437.0 41.890389 -87.628021
94180 JB125173 01/01/2018 01:00:00 AM 007XX W GRACE ST 0890 THEFT FROM BUILDING APARTMENT N N 1925 46.0 06 1170516.0 1925645.0 41.951457 -87.648582
236262 JB100825 01/01/2018 01:00:00 AM 006XX N FRANKLIN ST 0870 THEFT POCKET-PICKING BAR OR TAVERN N N 1831 42.0 06 1174211.0 1904618.0 41.893677 -87.635628

Hmmm. Something isn't right about how this is sorting. Let's look into it.


In [248]:
theft.dtypes


Out[248]:
CASE#                     object
DATE_OF_OCCURRENCE        object
BLOCK                     object
IUCR                      object
PRIMARY_DESCRIPTION       object
SECONDARY_DESCRIPTION     object
LOCATION_DESCRIPTION      object
ARREST                    object
DOMESTIC                  object
BEAT                       int64
WARD                     float64
FBI_CD                    object
X_COORDINATE             float64
Y_COORDINATE             float64
LATITUDE                 float64
LONGITUDE                float64
dtype: object

Right now the dates are objects. To ensure they're handled correctly, they should be datetime. Let's fix that!


In [250]:
theft.DATE_OF_OCCURRENCE = pd.to_datetime(theft.DATE_OF_OCCURRENCE)


/Users/mollie/.pyenv/versions/3.6.0/lib/python3.6/site-packages/pandas/core/generic.py:3110: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value

In [251]:
theft.head()


Out[251]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
234598 JB101036 2018-01-01 01:00:00 003XX N WABASH AVE 0890 THEFT FROM BUILDING HOTEL/MOTEL N N 1834 42.0 06 1176556.0 1902666.0 41.888267 -87.627075
240910 JB100870 2018-01-01 01:00:00 004XX N CLARK ST 0890 THEFT FROM BUILDING RESTAURANT N N 1831 42.0 06 1175463.0 1903198.0 41.889752 -87.631073
143954 JB101281 2018-01-01 01:00:00 004XX N STATE ST 0870 THEFT POCKET-PICKING BAR OR TAVERN N N 1831 42.0 06 1176292.0 1903437.0 41.890389 -87.628021
94180 JB125173 2018-01-01 01:00:00 007XX W GRACE ST 0890 THEFT FROM BUILDING APARTMENT N N 1925 46.0 06 1170516.0 1925645.0 41.951457 -87.648582
236262 JB100825 2018-01-01 01:00:00 006XX N FRANKLIN ST 0870 THEFT POCKET-PICKING BAR OR TAVERN N N 1831 42.0 06 1174211.0 1904618.0 41.893677 -87.635628

In [252]:
theft.sort_values('DATE_OF_OCCURRENCE', inplace=True, ascending=True)
theft.head()


/Users/mollie/.pyenv/versions/3.6.0/lib/python3.6/site-packages/ipykernel/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
Out[252]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
206961 JA280087 2017-05-18 06:00:00 0000X E ERIE ST 0890 THEFT FROM BUILDING RESIDENCE-GARAGE N N 1834 42.0 06 1176342.0 1904782.0 41.894079 -87.627797
45331 JA273488 2017-05-18 07:00:00 024XX N RACINE AVE 0820 THEFT $500 AND UNDER RESIDENCE PORCH/HALLWAY N N 1932 32.0 06 1167900.0 1916271.0 41.925792 -87.658470
198095 JA269086 2017-05-18 07:35:00 0000X W TERMINAL ST 0890 THEFT FROM BUILDING AIRPORT BUILDING NON-TERMINAL - SECURE AREA N N 1651 41.0 06 1100317.0 1935229.0 41.979006 -87.906463
246179 JA269617 2017-05-18 07:40:00 009XX W BELMONT AVE 0820 THEFT $500 AND UNDER CTA STATION N N 1933 44.0 06 1169269.0 1921430.0 41.939918 -87.653289
94397 JA298230 2017-05-18 08:00:00 0000X W MONROE ST 0870 THEFT POCKET-PICKING CTA TRAIN N N 112 42.0 06 1176289.0 1899922.0 41.880744 -87.628138

loc vs iloc

You can see that the row labels for the first 5 rows are NOT 0, 1, 2, 3, and 4. If we wanted to select the first five rows, we can use DataFrame.iloc[] method to select by position. If you want to select the rows with labels 0 through 4, you would use DataFrame.loc[].

The easy way to remember which is which is to remember that iloc[] stands for integer location, because you use integers and not labels to select the data.


In [253]:
#print the first five rows of theft data
theft.iloc[0:5]


Out[253]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
206961 JA280087 2017-05-18 06:00:00 0000X E ERIE ST 0890 THEFT FROM BUILDING RESIDENCE-GARAGE N N 1834 42.0 06 1176342.0 1904782.0 41.894079 -87.627797
45331 JA273488 2017-05-18 07:00:00 024XX N RACINE AVE 0820 THEFT $500 AND UNDER RESIDENCE PORCH/HALLWAY N N 1932 32.0 06 1167900.0 1916271.0 41.925792 -87.658470
198095 JA269086 2017-05-18 07:35:00 0000X W TERMINAL ST 0890 THEFT FROM BUILDING AIRPORT BUILDING NON-TERMINAL - SECURE AREA N N 1651 41.0 06 1100317.0 1935229.0 41.979006 -87.906463
246179 JA269617 2017-05-18 07:40:00 009XX W BELMONT AVE 0820 THEFT $500 AND UNDER CTA STATION N N 1933 44.0 06 1169269.0 1921430.0 41.939918 -87.653289
94397 JA298230 2017-05-18 08:00:00 0000X W MONROE ST 0870 THEFT POCKET-PICKING CTA TRAIN N N 112 42.0 06 1176289.0 1899922.0 41.880744 -87.628138

In [254]:
#print first ten rows of theft data
theft.iloc[0:10]


Out[254]:
CASE# DATE_OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE
206961 JA280087 2017-05-18 06:00:00 0000X E ERIE ST 0890 THEFT FROM BUILDING RESIDENCE-GARAGE N N 1834 42.0 06 1176342.0 1904782.0 41.894079 -87.627797
45331 JA273488 2017-05-18 07:00:00 024XX N RACINE AVE 0820 THEFT $500 AND UNDER RESIDENCE PORCH/HALLWAY N N 1932 32.0 06 1167900.0 1916271.0 41.925792 -87.658470
198095 JA269086 2017-05-18 07:35:00 0000X W TERMINAL ST 0890 THEFT FROM BUILDING AIRPORT BUILDING NON-TERMINAL - SECURE AREA N N 1651 41.0 06 1100317.0 1935229.0 41.979006 -87.906463
246179 JA269617 2017-05-18 07:40:00 009XX W BELMONT AVE 0820 THEFT $500 AND UNDER CTA STATION N N 1933 44.0 06 1169269.0 1921430.0 41.939918 -87.653289
94397 JA298230 2017-05-18 08:00:00 0000X W MONROE ST 0870 THEFT POCKET-PICKING CTA TRAIN N N 112 42.0 06 1176289.0 1899922.0 41.880744 -87.628138
92868 JA268665 2017-05-18 08:00:00 069XX S ASHLAND AVE 0880 THEFT PURSE-SNATCHING CTA BUS STOP N N 735 17.0 06 1166876.0 1858796.0 41.768097 -87.663879
13140 JA268962 2017-05-18 08:00:00 015XX S WABASH AVE 0820 THEFT $500 AND UNDER STREET N N 131 3.0 06 1177014.0 1892876.0 41.861393 -87.625690
173895 JA269492 2017-05-18 08:00:00 055XX S LOWE AVE 0810 THEFT OVER $500 SCHOOL, PRIVATE, BUILDING N N 711 3.0 06 1172909.0 1868109.0 41.793522 -87.641490
223050 JA269003 2017-05-18 08:15:00 034XX N AUSTIN AVE 0820 THEFT $500 AND UNDER SCHOOL, PUBLIC, BUILDING N N 1633 38.0 06 1135778.0 1922071.0 41.942340 -87.776365
71935 JA268887 2017-05-18 08:15:00 078XX S RACINE AVE 0810 THEFT OVER $500 OTHER N N 612 17.0 06 1169677.0 1852985.0 41.752090 -87.653780

In [255]:
#print the rows with index label 12
theft.loc[12]


Out[255]:
CASE#                               JB243959
DATE_OF_OCCURRENCE       2018-04-28 20:00:00
BLOCK                    068XX S ASHLAND AVE
IUCR                                    0890
PRIMARY_DESCRIPTION                    THEFT
SECONDARY_DESCRIPTION          FROM BUILDING
LOCATION_DESCRIPTION      SMALL RETAIL STORE
ARREST                                     N
DOMESTIC                                   N
BEAT                                     725
WARD                                      17
FBI_CD                                    06
X_COORDINATE                     1.16685e+06
Y_COORDINATE                     1.85946e+06
LATITUDE                             41.7699
LONGITUDE                            -87.664
Name: 12, dtype: object

In [256]:
# print the row at the fifth position
theft.iloc[4]


Out[256]:
CASE#                               JA298230
DATE_OF_OCCURRENCE       2017-05-18 08:00:00
BLOCK                      0000X W MONROE ST
IUCR                                    0870
PRIMARY_DESCRIPTION                    THEFT
SECONDARY_DESCRIPTION         POCKET-PICKING
LOCATION_DESCRIPTION               CTA TRAIN
ARREST                                     N
DOMESTIC                                   N
BEAT                                     112
WARD                                      42
FBI_CD                                    06
X_COORDINATE                     1.17629e+06
Y_COORDINATE                     1.89992e+06
LATITUDE                             41.8807
LONGITUDE                           -87.6281
Name: 94397, dtype: object

Applying functions to series and creating new columns


In [257]:
scores = pd.read_csv('fandango_score_comparison.csv')

In [258]:
scores.head()


Out[258]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm ... IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
0 Avengers: Age of Ultron (2015) 74 86 66 7.1 7.8 5.0 4.5 3.70 4.3 ... 3.90 3.5 4.5 3.5 3.5 4.0 1330 271107 14846 0.5
1 Cinderella (2015) 85 80 67 7.5 7.1 5.0 4.5 4.25 4.0 ... 3.55 4.5 4.0 3.5 4.0 3.5 249 65709 12640 0.5
2 Ant-Man (2015) 80 90 64 8.1 7.8 5.0 4.5 4.00 4.5 ... 3.90 4.0 4.5 3.0 4.0 4.0 627 103660 12055 0.5
3 Do You Believe? (2015) 18 84 22 4.7 5.4 5.0 4.5 0.90 4.2 ... 2.70 1.0 4.0 1.0 2.5 2.5 31 3136 1793 0.5
4 Hot Tub Time Machine 2 (2015) 14 28 29 3.4 5.1 3.5 3.0 0.70 1.4 ... 2.55 0.5 1.5 1.5 1.5 2.5 88 19560 1021 0.5

5 rows × 22 columns


In [259]:
scores.describe()


Out[259]:
RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm Metacritic_norm ... IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
count 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 ... 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000 146.000000
mean 60.849315 63.876712 58.808219 6.519178 6.736986 4.089041 3.845205 3.042466 3.193836 2.940411 ... 3.368493 3.065068 3.226027 2.972603 3.270548 3.380137 185.705479 42846.205479 3848.787671 0.243836
std 30.168799 20.024430 19.517389 1.510712 0.958736 0.540386 0.502831 1.508440 1.001222 0.975869 ... 0.479368 1.514600 1.007014 0.990961 0.788116 0.502767 316.606515 67406.509171 6357.778617 0.152665
min 5.000000 20.000000 13.000000 2.400000 4.000000 3.000000 2.700000 0.250000 1.000000 0.650000 ... 2.000000 0.500000 1.000000 0.500000 1.000000 2.000000 4.000000 243.000000 35.000000 0.000000
25% 31.250000 50.000000 43.500000 5.700000 6.300000 3.500000 3.500000 1.562500 2.500000 2.175000 ... 3.150000 1.500000 2.500000 2.125000 3.000000 3.000000 33.250000 5627.000000 222.250000 0.100000
50% 63.500000 66.500000 59.000000 6.850000 6.900000 4.000000 3.900000 3.175000 3.325000 2.950000 ... 3.450000 3.000000 3.500000 3.000000 3.500000 3.500000 72.500000 19103.000000 1446.000000 0.200000
75% 89.000000 81.000000 75.000000 7.500000 7.400000 4.500000 4.200000 4.450000 4.050000 3.750000 ... 3.700000 4.500000 4.000000 4.000000 4.000000 3.500000 168.500000 45185.750000 4439.500000 0.400000
max 100.000000 94.000000 94.000000 9.600000 8.600000 5.000000 4.800000 5.000000 4.700000 4.700000 ... 4.300000 5.000000 4.500000 4.500000 5.000000 4.500000 2375.000000 334164.000000 34846.000000 0.500000

8 rows × 21 columns


In [260]:
scores.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146 entries, 0 to 145
Data columns (total 22 columns):
FILM                          146 non-null object
RottenTomatoes                146 non-null int64
RottenTomatoes_User           146 non-null int64
Metacritic                    146 non-null int64
Metacritic_User               146 non-null float64
IMDB                          146 non-null float64
Fandango_Stars                146 non-null float64
Fandango_Ratingvalue          146 non-null float64
RT_norm                       146 non-null float64
RT_user_norm                  146 non-null float64
Metacritic_norm               146 non-null float64
Metacritic_user_nom           146 non-null float64
IMDB_norm                     146 non-null float64
RT_norm_round                 146 non-null float64
RT_user_norm_round            146 non-null float64
Metacritic_norm_round         146 non-null float64
Metacritic_user_norm_round    146 non-null float64
IMDB_norm_round               146 non-null float64
Metacritic_user_vote_count    146 non-null int64
IMDB_user_vote_count          146 non-null int64
Fandango_votes                146 non-null int64
Fandango_Difference           146 non-null float64
dtypes: float64(15), int64(6), object(1)
memory usage: 25.2+ KB

In [261]:
scores.IMDB.mean()


Out[261]:
6.736986301369863

In [262]:
scores.IMDB.describe()


Out[262]:
count    146.000000
mean       6.736986
std        0.958736
min        4.000000
25%        6.300000
50%        6.900000
75%        7.400000
max        8.600000
Name: IMDB, dtype: float64

In [263]:
max_IMDB = scores.IMDB.max()
max_IMDB


Out[263]:
8.6

In [264]:
min_IMDB = scores.IMDB.min()
min_IMDB


Out[264]:
4.0

In [266]:
# Return the list of movies with the lowest score:
scores[scores.IMDB == min_IMDB]


Out[266]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm ... IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
48 Fantastic Four (2015) 9 20 27 2.5 4.0 3.0 2.7 0.45 1.0 ... 2.0 0.5 1.0 1.5 1.5 2.0 421 39838 6288 0.3

1 rows × 22 columns


In [265]:
#Return the list of movies with the highest score:
scores[scores.IMDB == max_IMDB]


Out[265]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm ... IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
140 Inside Out (2015) 98 90 94 8.9 8.6 4.5 4.5 4.9 4.5 ... 4.3 5.0 4.5 4.5 4.5 4.5 807 96252 15749 0.0

1 rows × 22 columns


In [267]:
# Movies with the highest RottenTomatoes rating
scores[scores.RottenTomatoes == scores.RottenTomatoes.max()]


Out[267]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm ... IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
70 Seymour: An Introduction (2015) 100 87 83 6.0 7.7 4.5 4.2 5.0 4.35 ... 3.85 5.0 4.5 4.0 3.0 4.0 4 243 41 0.3
144 Gett: The Trial of Viviane Amsalem (2015) 100 81 90 7.3 7.8 3.5 3.5 5.0 4.05 ... 3.90 5.0 4.0 4.5 3.5 4.0 19 1955 59 0.0

2 rows × 22 columns


In [269]:
# Movies with the lowest RottenTomatoes rating
scores[scores.RottenTomatoes == scores.RottenTomatoes.min()]


Out[269]:
FILM RottenTomatoes RottenTomatoes_User Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue RT_norm RT_user_norm ... IMDB_norm RT_norm_round RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count Fandango_votes Fandango_Difference
133 Paul Blart: Mall Cop 2 (2015) 5 36 13 2.4 4.3 3.5 3.5 0.25 1.8 ... 2.15 0.5 2.0 0.5 1.0 2.0 211 15004 3054 0.0

1 rows × 22 columns


In [ ]:

Now we can plot the series with ease!

Groupby!

Groupby is a powerful method that makes it easy to peform operations on the dataframe by categorial values. Let's try generating a plot of min, max, and average temp over time.


In [150]:
crimes.groupby('PRIMARY_DESCRIPTION').size()


Out[150]:
PRIMARY_DESCRIPTION
ARSON                                  382
ASSAULT                              19580
BATTERY                              48917
BURGLARY                             12104
CONCEALED CARRY LICENSE VIOLATION       83
CRIM SEXUAL ASSAULT                   1515
CRIMINAL DAMAGE                      28085
CRIMINAL TRESPASS                     6881
DECEPTIVE PRACTICE                   17672
GAMBLING                               180
HOMICIDE                               627
HUMAN TRAFFICKING                        8
INTERFERENCE WITH PUBLIC OFFICER      1151
INTIMIDATION                           138
KIDNAPPING                             191
LIQUOR LAW VIOLATION                   207
MOTOR VEHICLE THEFT                  10815
NARCOTICS                            11448
NON-CRIMINAL                            45
NON-CRIMINAL (SUBJECT SPECIFIED)         1
OBSCENITY                               91
OFFENSE INVOLVING CHILDREN            2222
OTHER NARCOTIC VIOLATION                10
OTHER OFFENSE                        16407
PROSTITUTION                           649
PUBLIC INDECENCY                        10
PUBLIC PEACE VIOLATION                1427
ROBBERY                              11292
SEX OFFENSE                            970
STALKING                               168
THEFT                                63821
WEAPONS VIOLATION                     4867
dtype: int64

In [136]:
crimes.groupby(['PRIMARY_DESCRIPTION', 'SECONDARY_DESCRIPTION']).size()


Out[136]:
PRIMARY_DESCRIPTION  SECONDARY_DESCRIPTION                                      
ARSON                AGGRAVATED                                                        53
                     ATTEMPT ARSON                                                     46
                     BY EXPLOSIVE                                                       4
                     BY FIRE                                                          276
                     POS: EXPLOSIVE/INCENDIARY DEV                                      3
ASSAULT              AGG PO HANDS NO/MIN INJURY                                       222
                     AGG PRO.EMP: HANDGUN                                              37
                     AGG PRO.EMP: OTHER DANG WEAPON                                    69
                     AGG PRO.EMP: OTHER FIREARM                                         1
                     AGG PRO.EMP:KNIFE/CUTTING INST                                    42
                     AGGRAVATED PO: HANDGUN                                            37
                     AGGRAVATED PO: OTHER DANG WEAP                                    54
                     AGGRAVATED PO: OTHER FIREARM                                       2
                     AGGRAVATED PO:KNIFE/CUT INSTR                                     27
                     AGGRAVATED: HANDGUN                                             2886
                     AGGRAVATED: OTHER DANG WEAPON                                   1086
                     AGGRAVATED: OTHER FIREARM                                         87
                     AGGRAVATED:KNIFE/CUTTING INSTR                                  1518
                     PRO EMP HANDS NO/MIN INJURY                                      592
                     SIMPLE                                                         12920
BATTERY              AGG PO HANDS ETC SERIOUS INJ                                      29
                     AGG PO HANDS NO/MIN INJURY                                       543
                     AGG PRO EMP HANDS SERIOUS INJ                                     31
                     AGG PRO.EMP: OTHER DANG WEAPON                                    66
                     AGG PRO.EMP:KNIFE/CUTTING INST                                     9
                     AGG: HANDS/FIST/FEET NO/MINOR INJURY                             190
                     AGG: HANDS/FIST/FEET SERIOUS INJURY                              141
                     AGGRAVATED DOMESTIC BATTERY: HANDGUN                              11
                     AGGRAVATED DOMESTIC BATTERY: HANDS/FIST/FEET SERIOUS INJURY      482
                     AGGRAVATED DOMESTIC BATTERY: KNIFE/CUTTING INST                  717
                                                                                    ...  
SEX OFFENSE          INDECENT SOLICITATION/CHILD                                       21
                     OTHER                                                             41
                     PUBLIC INDECENCY                                                 210
                     SEX RELATION IN FAMILY                                             2
                     SEXUAL EXPLOITATION OF A CHILD                                    37
STALKING             CYBERSTALKING                                                     20
                     SIMPLE                                                           131
                     VIOLATION OF STALKING NO CONTACT ORDER                            17
THEFT                $500 AND UNDER                                                 24343
                     ATTEMPT THEFT                                                    434
                     DELIVERY CONTAINER THEFT                                          32
                     FROM BUILDING                                                  10487
                     FROM COIN-OP MACHINE/DEVICE                                       18
                     OVER $500                                                      14769
                     POCKET-PICKING                                                  2243
                     PURSE-SNATCHING                                                  623
                     RETAIL THEFT                                                   10872
WEAPONS VIOLATION    DEFACE IDENT MARKS OF FIREARM                                      1
                     POSS FIREARM/AMMO:NO FOID CARD                                    74
                     RECKLESS FIREARM DISCHARGE                                       375
                     UNLAWFUL POSS AMMUNITION                                         102
                     UNLAWFUL POSS OF HANDGUN                                        3608
                     UNLAWFUL POSS OTHER FIREARM                                      113
                     UNLAWFUL SALE HANDGUN                                              4
                     UNLAWFUL SALE/DELIVERY OF FIREARM AT SCHOOL                        2
                     UNLAWFUL USE HANDGUN                                             376
                     UNLAWFUL USE OTHER DANG WEAPON                                   117
                     UNLAWFUL USE OTHER FIREARM                                        50
                     UNLAWFUL USE/SALE AIR RIFLE                                       44
                     USE OF METAL PIERCING BULLETS                                      1
Length: 328, dtype: int64

In [ ]:


In [151]:
crimes.head()


Out[151]:
CASE# DATE__OF_OCCURRENCE BLOCK IUCR PRIMARY_DESCRIPTION SECONDARY_DESCRIPTION LOCATION_DESCRIPTION ARREST DOMESTIC BEAT WARD FBI_CD X_COORDINATE Y_COORDINATE LATITUDE LONGITUDE LOCATION
0 JB241987 04/28/2018 10:05:00 PM 009XX N LONG AVE 2092 NARCOTICS SOLICIT NARCOTICS ON PUBLICWAY SIDEWALK Y N 1524 37.0 18 1140136.0 1905903.0 41.897895 -87.760744 (41.897894893, -87.760743714)
1 JA430240 09/06/2017 01:30:00 PM 032XX W 26TH ST 0810 THEFT OVER $500 OTHER Y N 1024 12.0 06 1155313.0 1886555.0 41.844510 -87.705519 (41.844510467, -87.705519454)
2 JB241350 04/28/2018 08:00:00 AM 008XX E 53RD ST 1320 CRIMINAL DAMAGE TO VEHICLE STREET N N 233 5.0 14 1182892.0 1870055.0 41.798635 -87.604823 (41.798635468, -87.604823241)
3 JB245397 04/28/2018 09:00:00 AM 062XX S MICHIGAN AVE 0820 THEFT $500 AND UNDER RESIDENCE PORCH/HALLWAY N N 311 20.0 06 1178263.0 1863570.0 41.780946 -87.621995 (41.780946398, -87.621995369)
4 JB241444 04/28/2018 12:15:00 PM 046XX N ELSTON AVE 0890 THEFT FROM BUILDING SMALL RETAIL STORE N N 1722 39.0 06 1146646.0 1930549.0 41.965404 -87.736202 (41.965404069, -87.736202402)

In [156]:
crimes.DATE__OF_OCCURRENCE = pd.to_datetime(crimes.DATE__OF_OCCURRENCE)

In [165]:
crimes['year'] = crimes.DATE__OF_OCCURRENCE.map(lambda x: x.year)

In [164]:
crimes['month'] = crimes.DATE__OF_OCCURRENCE.map(lambda x: x.month)

In [168]:
month_year_crimes = crimes.groupby(['year', 'month']).size()
month_year_crimes


Out[168]:
year  month
2017  5        10548
      6        23711
      7        24735
      8        24602
      9        22704
      10       22741
      11       21325
      12       20855
2018  1        20021
      2        17011
      3        20738
      4        20440
      5        12533
dtype: int64

In [169]:
month_year_crimes.plot()


Out[169]:
<matplotlib.axes._subplots.AxesSubplot at 0x11e814898>

In [170]:
month_year_crimes.hist()


Out[170]:
<matplotlib.axes._subplots.AxesSubplot at 0x11e88b160>

In [ ]: