Pysegreg run - Time based

Instructions

For fast processing, you can just change the following variables before running:

path/name at Input file cell (select the file you want to use)
path/name at Input data to generate Time Matrix cell (create time matrix from local file)
bandwidth and weigth method at compute population intensity cell
file name in the variable fname at section Save results to a local file (the file you want to save results)

make sure you don't use a name already used or the file will be replaced

With the previous steps in mind, just click on Cell menu and select Run All



In [5]:

    
# Imports
import numpy as np
np.seterr(all='ignore')
import pandas as pd
from decimal import Decimal
import time
import csv

# Import python script with Pysegreg functions
from segregationMetrics import Segreg

# Instantiate segreg as cc
cc = Segreg()

Input file

Attention to the new data structure for input !!!

Change your input file with path/name in the cell below to be processed.



In [6]:

    
cc.readAttributesFile('/Users/sandrofsousa/Downloads/valid/Segreg sample.csv')









    Out[6]:





matrix([[  3.33245530e+05,   7.39477232e+06,   7.70000000e-01, ...,
           3.65000000e+00,   1.49000000e+00,   3.45000000e+00],
        [  3.33657950e+05,   7.39531053e+06,   5.10000000e-01, ...,
           8.41000000e+00,   2.12000000e+00,   1.14000000e+00],
        [  3.33381780e+05,   7.39420259e+06,   1.42000000e+00, ...,
           2.20800000e+01,   2.88000000e+01,   9.11000000e+00],
        ..., 
        [  3.00003210e+05,   7.39509971e+06,   2.00000000e-01, ...,
           7.60000000e-01,   2.49000000e+00,   1.69000000e+00],
        [  3.04217510e+05,   7.40542844e+06,   5.00000000e-02, ...,
           1.40000000e-01,   6.20000000e-01,   2.20000000e-01],
        [  2.97174290e+05,   7.41325052e+06,   0.00000000e+00, ...,
           1.00000000e-02,   1.30000000e-01,   7.00000000e-02]])

Input data to generate Time Matrix

Change the variable path with the correct data source.
Clean data is already available from the dropbox project's folder.



In [11]:

    
# start = time.time()

# adjust this path according to data source
path = "../tempos/p"

# Create a list with file names from time data to be processed.
file_list = [path + str(i) +"_TP.txt" for i in range(1, 20)]

# create an empty matrix to be updated with data
matrix = np.empty([18953, 18953])  # this shape may change due to new matrice sizes

# loop at files to populate matrix according to area ID pairs compared.
for file in file_list:
    with open(file, "r") as data:
        parser = csv.reader(data, delimiter=";")
        next(parser)  # skip header
        for line in parser:
            origin = int(line[1]) -1
            destiny = int(line[2]) -1
            travel_time = float(line[3])
            matrix[origin, destiny] = travel_time  #update position based on index

print("--- %s minutes for processing ---" % round((time.time() - start)/60, 2))









    



--- 1.21 minutes for processing ---

Measures

Compute Population Intensity based on time

For non spatial result, please comment the function call at: "cc.locality= ..."

to comment a code use # in the begining of the line

The Time matrix is used to compute population intensity at this step. Change the parameters according to your needs. Parameters are:

bandwidth - is set to be 5000m by default, you can change it here
weightmethod - 1 for gaussian, 2 for bi-square and empty for moving window
matrix - Time matrix in a simetric shape (variable name computed previously)



In [ ]:

    
start_time = time.time()

cc.locality = cc.cal_timeMatrix(bandwidth=10, weightmethod=1, matrix=matrix)

print("--- %s seconds for processing ---" % round(time.time() - start_time, 2))

For validation only
Remove the comment (#) if you want to see the values and validate



In [4]:

    
# np.set_printoptions(threshold=np.inf)
# print('Location (coordinates from data):\n', cc.location)
# print()
# print('Population intensity for all groups:\n', cc.locality)

'''To select locality for a specific line (validation), use the index in[x,:]'''
# where x is the number of the desired line

# cc.locality[5,:]









    Out[4]:





'To select locality for a specific line (validation), use the index in[x,:]'

Compute local Dissimilarity



In [5]:

    
diss_local = cc.cal_localDissimilarity()
diss_local = np.asmatrix(diss_local).transpose()

Compute global Dissimilarity



In [6]:

    
diss_global = cc.cal_globalDissimilarity()

Compute local Exposure/Isolation
expo is a matrix of n_group * n_group therefore, exposure (m,n) = rs[m,n]
the columns are exporsure m1 to n1, to n2... n5, m2 to n1....n5

m,m = isolation index of group m
m,n = expouse index of group m to n

Result of all combinations of local groups expousure/isolation
To select a specific line of m to n, use the index [x]
Each value is a result of the combinations m,n
e.g.: g1xg1, g1xg2, g2,g1, g2xg2 = isolation, expousure, // , isolation



In [7]:

    
expo_local = cc.cal_localExposure()

Compute global Exposure/Isolation



In [18]:

    
expo_global = cc.cal_globalExposure()

Compute local Entropy



In [19]:

    
entro_local = cc.cal_localEntropy()

Compute global Entropy



In [20]:

    
entro_global = cc.cal_globalEntropy()

Compute local Index H



In [21]:

    
idxh_local = cc.cal_localIndexH()

Compute global Index H



In [22]:

    
idxh_global = cc.cal_globalIndexH()

Results

Prepare data for saving on a local file



In [23]:

    
# Concatenate local values from measures
if len(cc.locality) == 0:
    results = np.concatenate((expo_local, diss_local, entro_local, idxh_local), axis=1)
else:
    results = np.concatenate((cc.locality, expo_local, diss_local, entro_local, idxh_local), axis=1)

# Concatenate the results with original data
output = np.concatenate((cc.tract_id, cc.attributeMatrix, results),axis = 1)



In [24]:

    
names = ['id','x','y']

for i in range(cc.n_group):
    names.append('group_'+str(i))

if len(cc.locality) == 0:    
    for i in range(cc.n_group):
        for j in range(cc.n_group):
            if i == j:
                names.append('iso_' + str(i) + str(j))
            else:
                names.append('exp_' + str(i) + str(j))
            
    names.append('dissimil')
    names.append('entropy')
    names.append('indexh')
    
else:
    for i in range(cc.n_group):
        names.append('intens_'+str(i))
        
    for i in range(cc.n_group):
        for j in range(cc.n_group):
            if i == j:
                names.append('iso_' + str(i) + str(j))
            else:
                names.append('exp_' + str(i) + str(j))
            
    names.append('dissimil')
    names.append('entropy')
    names.append('indexh')

Save Local and global results to a file

The paramenter fname corresponds to the folder/filename, change it as you want.
To save on a diferent folder, use the "/" to pass the directory.
The local results will be saved using the name defined and adding the "_local" postfix to file's name.
The global results are automatically saved using the same name with the addiction of the postfix "_globals".

It's recommended to save on a different folder from the code, e.g.: a folder named result.

The fname value should be changed for any new executions or the local file will be overwrited!



In [34]:

    
fname = "/Users/sandrofsousa/Downloads/valid/result"

output = pd.DataFrame(output, columns=names)
output.to_csv("%s_local.csv" % fname, sep=",", index=False)
with open("%s_global.txt" % fname, "w") as f:
    f.write('Global dissimilarity: ' + str(diss_global))
    f.write('\nGlobal entropy: ' + str(entro_global))
    f.write('\nGlobal Index H: ' + str(idxh_global))
    f.write('\nGlobal isolation/exposure: \n')
    f.write(str(expo_global))