Campus Crime Data Analysis - Project Report

In this notebook, we'll sumarize our finding for the presentation.

Problems

Our analysis of campus security data through 2010-2012 is based on raw data from the U.S. Department of Education. The main question we will address is the relationship between various campus crimes and factors like institution location, sector, sex ratios, etc. More specific, we did a case study on eight main campuses of University of California to explore difference of campus security.

Data Source

The U.S. Department of Education provides raw campus security data through 2010-2012 in excel files on its website and change the excel files to .csv. Each dataset contains over 10,000 postsecondary institutions with information about different types of crime and also information regarding the intuitions such as private/public, gender ratio and geographical location. For our interest, we choose the specific file about "on-campus crime" to focus on. The data provided by government website are comprehensive enough, but there are some minor typos that we need to fix when loading the data.

Main challenge

One of the major difficulties is that the file contains counts for various types of crimes. However, our analysis needs a single measure for the overall safety level of institutions. To solve the problem, we came up with a "crime index" to carry on further analysis. The other difficult part of this project should be the visualization for the relationship between location (zip code) and crime. The use of "maps" package helps us locate the crime data into each state.


In [2]:
from IPython.display import Image

In [12]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clean1.png')


Out[12]:

One outlier was detected. Pennsylvania State Univesity (PSU) has much higher forcible sex offence accidents in 2012. However, we are not clear about the potential causal factors of this observation.


In [15]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_school1.png')


Out[15]:

No significant individual differences can be observed among public schools and private school. However, according to means and the distribution of the outliers, we can see that private schools (non-profit) are tended to have more campus security problems.


In [16]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_school2.png')


Out[16]:

In [17]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_school3.png')


Out[17]:

In [18]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_total1.png')


Out[18]:

In [19]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_total2.png')


Out[19]:

In [20]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_total3.png')


Out[20]:

Among the private schools (profit), we did a deeepr level of analysis. Schools with a smaller number of students are tended to have higher crime rate. Drawing upon this statement, we further explore the schools present in this subsample. We find that a small number of art schools with high-crime rate skew the mean of crime rates of non-profit private schools.


In [21]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_sex1.png')


Out[21]:

Considering sex ratio is variant in different academic displines, we explore whether the sex ratio can be a potential mediator to explain the stated differences between public and private schools. We find that schools with more women have a higher rate of crimes than schools with more men. This might predict that art schools might be more vulnerable to campus crimes.


In [22]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_ca1.png')


Out[22]:

Using method of K-means clustering, we classify all the schools in California into two groups by the crime index. No significant results can be directly observed.


In [24]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_ca3.png')


Out[24]:

As a result, we classified the available data into three groups by method of k means. More clearly than the previous graph, two outliers are detected as high-crime schools (noted as green dots: Feather River Community College District, Quincy & Providence Chirstian college, Pasadena)


In [25]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_ca4.png')


Out[25]:

With a classified of four groups, schools with lower crime index are further divided.


In [26]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_kmeans1.png')


Out[26]:

Crime index of one means the less severe crime on campus such as stealings and burglaries. Crowded blue dots at the left bottom part of the graph shows that burglary is the most prevalant and common crime type in colleges and universities.


In [27]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_kmeans2.png')


Out[27]:

The further classification of K-means in the two-dimension senario, schools with more security issues are further defined with both qualitative and quantativive attributes.


In [3]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_kmean3.png')


Out[3]:

The schools are further grouped into three groups: low-crime group (green), burglary-intense(blue) and severe-crime(red).


In [29]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_kmean4.png')


Out[29]:

This is a further classification with 4 groups. The low-crime group is divided into safe (red) and low-crime (blue).


In [30]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_h1.png')


Out[30]:

Method of Hierarchical cluster is used to analyze the safety issue of the ten main campuses of Universitiy of Califonia. For the first step, two groups (safe vs. unsafe) are formed.


In [31]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_h2.png')


Out[31]:

In a classification of 4 groups, UC Merced is found as a safest and UC Santa Cruz is found as most dangerous.


In [4]:
Image(filename='/home/oski/project2/stat133-project2/examples/data/visualization/clustering_states.png')


Out[4]:

Schools in California are found with lowest crime rates. Schools in Missouri are found with highest crime rates. Policy differences needs further investigations to better improve campus security countrywide.