Instructions
For fast processing, you can just change the following variables before running:
make sure you don't use a name already used or the file will be replaced
With the previous steps in mind, just click on Cell menu and select Run All
In [5]:
# Imports
import numpy as np
np.seterr(all='ignore')
import pandas as pd
from decimal import Decimal
import time
import csv
# Import python script with Pysegreg functions
from segregationMetrics import Segreg
# Instantiate segreg as cc
cc = Segreg()
In [6]:
cc.readAttributesFile('/Users/sandrofsousa/Downloads/valid/Segreg sample.csv')
Out[6]:
In [11]:
# start = time.time()
# adjust this path according to data source
path = "../tempos/p"
# Create a list with file names from time data to be processed.
file_list = [path + str(i) +"_TP.txt" for i in range(1, 20)]
# create an empty matrix to be updated with data
matrix = np.empty([18953, 18953]) # this shape may change due to new matrice sizes
# loop at files to populate matrix according to area ID pairs compared.
for file in file_list:
with open(file, "r") as data:
parser = csv.reader(data, delimiter=";")
next(parser) # skip header
for line in parser:
origin = int(line[1]) -1
destiny = int(line[2]) -1
travel_time = float(line[3])
matrix[origin, destiny] = travel_time #update position based on index
print("--- %s minutes for processing ---" % round((time.time() - start)/60, 2))
Compute Population Intensity based on time
For non spatial result, please comment the function call at: "cc.locality= ..."
The Time matrix is used to compute population intensity at this step. Change the parameters according to your needs. Parameters are:
In [ ]:
start_time = time.time()
cc.locality = cc.cal_timeMatrix(bandwidth=10, weightmethod=1, matrix=matrix)
print("--- %s seconds for processing ---" % round(time.time() - start_time, 2))
For validation only
Remove the comment (#) if you want to see the values and validate
In [4]:
# np.set_printoptions(threshold=np.inf)
# print('Location (coordinates from data):\n', cc.location)
# print()
# print('Population intensity for all groups:\n', cc.locality)
'''To select locality for a specific line (validation), use the index in[x,:]'''
# where x is the number of the desired line
# cc.locality[5,:]
Out[4]:
Compute local Dissimilarity
In [5]:
diss_local = cc.cal_localDissimilarity()
diss_local = np.asmatrix(diss_local).transpose()
Compute global Dissimilarity
In [6]:
diss_global = cc.cal_globalDissimilarity()
Compute local Exposure/Isolation
expo is a matrix of n_group * n_group therefore, exposure (m,n) = rs[m,n]
the columns are exporsure m1 to n1, to n2... n5, m2 to n1....n5
Result of all combinations of local groups expousure/isolation
To select a specific line of m to n, use the index [x]
Each value is a result of the combinations m,n
e.g.: g1xg1, g1xg2, g2,g1, g2xg2 = isolation, expousure, // , isolation
In [7]:
expo_local = cc.cal_localExposure()
Compute global Exposure/Isolation
In [18]:
expo_global = cc.cal_globalExposure()
Compute local Entropy
In [19]:
entro_local = cc.cal_localEntropy()
Compute global Entropy
In [20]:
entro_global = cc.cal_globalEntropy()
Compute local Index H
In [21]:
idxh_local = cc.cal_localIndexH()
Compute global Index H
In [22]:
idxh_global = cc.cal_globalIndexH()
In [23]:
# Concatenate local values from measures
if len(cc.locality) == 0:
results = np.concatenate((expo_local, diss_local, entro_local, idxh_local), axis=1)
else:
results = np.concatenate((cc.locality, expo_local, diss_local, entro_local, idxh_local), axis=1)
# Concatenate the results with original data
output = np.concatenate((cc.tract_id, cc.attributeMatrix, results),axis = 1)
In [24]:
names = ['id','x','y']
for i in range(cc.n_group):
names.append('group_'+str(i))
if len(cc.locality) == 0:
for i in range(cc.n_group):
for j in range(cc.n_group):
if i == j:
names.append('iso_' + str(i) + str(j))
else:
names.append('exp_' + str(i) + str(j))
names.append('dissimil')
names.append('entropy')
names.append('indexh')
else:
for i in range(cc.n_group):
names.append('intens_'+str(i))
for i in range(cc.n_group):
for j in range(cc.n_group):
if i == j:
names.append('iso_' + str(i) + str(j))
else:
names.append('exp_' + str(i) + str(j))
names.append('dissimil')
names.append('entropy')
names.append('indexh')
Save Local and global results to a file
The paramenter fname corresponds to the folder/filename, change it as you want.
To save on a diferent folder, use the "/" to pass the directory.
The local results will be saved using the name defined and adding the "_local" postfix to file's name.
The global results are automatically saved using the same name with the addiction of the postfix "_globals".
It's recommended to save on a different folder from the code, e.g.: a folder named result.
The fname value should be changed for any new executions or the local file will be overwrited!
In [34]:
fname = "/Users/sandrofsousa/Downloads/valid/result"
output = pd.DataFrame(output, columns=names)
output.to_csv("%s_local.csv" % fname, sep=",", index=False)
with open("%s_global.txt" % fname, "w") as f:
f.write('Global dissimilarity: ' + str(diss_global))
f.write('\nGlobal entropy: ' + str(entro_global))
f.write('\nGlobal Index H: ' + str(idxh_global))
f.write('\nGlobal isolation/exposure: \n')
f.write(str(expo_global))