Pysegreg run - Distance based

Instructions

For fast processing, you can just change the following variables before running:

path/name at Input file cell (select the file you want to use)
bandwidth and weigth method at compute population intensity cell
file name in the variable fname at section Save results to a local file (the file you want to save results)

make sure you don't use a name already used or the file will be replaced

With the previous steps in mind, just click on Cell menu and select Run All



In [1]:

    
# Imports
import numpy as np
np.seterr(all='ignore')
import pandas as pd
from decimal import Decimal
import time

# Import python script with Pysegreg functions
from segregationMetrics import Segreg

# Instantiate segreg as cc
cc = Segreg()

Input file

Attention to the new data structure for input !!!

Change your input file with path/name in the cell below to be processed.



In [2]:

    
cc.readAttributesFile('/Users/sandrofsousa/Downloads/valid/Segreg sample.csv')









    Out[2]:





matrix([[  3.33245530e+05,   7.39477232e+06,   7.70000000e-01, ...,
           3.65000000e+00,   1.49000000e+00,   3.45000000e+00],
        [  3.33657950e+05,   7.39531053e+06,   5.10000000e-01, ...,
           8.41000000e+00,   2.12000000e+00,   1.14000000e+00],
        [  3.33381780e+05,   7.39420259e+06,   1.42000000e+00, ...,
           2.20800000e+01,   2.88000000e+01,   9.11000000e+00],
        ..., 
        [  3.00003210e+05,   7.39509971e+06,   2.00000000e-01, ...,
           7.60000000e-01,   2.49000000e+00,   1.69000000e+00],
        [  3.04217510e+05,   7.40542844e+06,   5.00000000e-02, ...,
           1.40000000e-01,   6.20000000e-01,   2.20000000e-01],
        [  2.97174290e+05,   7.41325052e+06,   0.00000000e+00, ...,
           1.00000000e-02,   1.30000000e-01,   7.00000000e-02]])

Measures

Compute Population Intensity

For non spatial result, please comment the function call at: "cc.locality= ..."

to comment a code use # in the begining of the line

Distance matrix is calculated at this step. Change the parameters for the population
intensity according to your needs. Parameters are:

bandwidth - is set to be 5000m by default, you can change it here
weightmethod - 1 for gaussian, 2 for bi-square and empty for moving window



In [3]:

    
start_time = time.time()

cc.locality = cc.cal_localityMatrix(bandwidth=700, weightmethod=1)

print("--- %s seconds for processing ---" % (time.time() - start_time))









    



--- 0.30095577239990234 seconds for processing ---

For validation only
Remove the comment (#) if you want to see the values and validate



In [4]:

    
# np.set_printoptions(threshold=np.inf)
# print('Location (coordinates from data):\n', cc.location)
# print()
# print('Population intensity for all groups:\n', cc.locality)

'''To select locality for a specific line (validation), use the index in[x,:]'''
# where x is the number of the desired line

# cc.locality[5,:]









    Out[4]:





'To select locality for a specific line (validation), use the index in[x,:]'

Compute local Dissimilarity



In [5]:

    
diss_local = cc.cal_localDissimilarity()
diss_local = np.asmatrix(diss_local).transpose()

Compute global Dissimilarity



In [6]:

    
diss_global = cc.cal_globalDissimilarity()

Compute local Exposure/Isolation
expo is a matrix of n_group * n_group therefore, exposure (m,n) = rs[m,n]
the columns are exporsure m1 to n1, to n2... n5, m2 to n1....n5

m,m = isolation index of group m
m,n = expouse index of group m to n

Result of all combinations of local groups expousure/isolation
To select a specific line of m to n, use the index [x]
Each value is a result of the combinations m,n
e.g.: g1xg1, g1xg2, g2,g1, g2xg2 = isolation, expousure, // , isolation



In [7]:

    
expo_local = cc.cal_localExposure()

Compute global Exposure/Isolation



In [18]:

    
expo_global = cc.cal_globalExposure()

Compute local Entropy



In [19]:

    
entro_local = cc.cal_localEntropy()

Compute global Entropy



In [20]:

    
entro_global = cc.cal_globalEntropy()

Compute local Index H



In [21]:

    
idxh_local = cc.cal_localIndexH()

Compute global Index H



In [22]:

    
idxh_global = cc.cal_globalIndexH()

Results

Prepare data for saving on a local file



In [23]:

    
# Concatenate local values from measures
if len(cc.locality) == 0:
    results = np.concatenate((expo_local, diss_local, entro_local, idxh_local), axis=1)
else:
    results = np.concatenate((cc.locality, expo_local, diss_local, entro_local, idxh_local), axis=1)

# Concatenate the results with original data
output = np.concatenate((cc.tract_id, cc.attributeMatrix, results),axis = 1)



In [24]:

    
names = ['id','x','y']

for i in range(cc.n_group):
    names.append('group_'+str(i))

if len(cc.locality) == 0:    
    for i in range(cc.n_group):
        for j in range(cc.n_group):
            if i == j:
                names.append('iso_' + str(i) + str(j))
            else:
                names.append('exp_' + str(i) + str(j))
            
    names.append('dissimil')
    names.append('entropy')
    names.append('indexh')
    
else:
    for i in range(cc.n_group):
        names.append('intens_'+str(i))
        
    for i in range(cc.n_group):
        for j in range(cc.n_group):
            if i == j:
                names.append('iso_' + str(i) + str(j))
            else:
                names.append('exp_' + str(i) + str(j))
            
    names.append('dissimil')
    names.append('entropy')
    names.append('indexh')

Save Local and global results to a file

The paramenter fname corresponds to the folder/filename, change it as you want.
To save on a diferent folder, use the "/" to pass the directory.
The local results will be saved using the name defined and adding the "_local" postfix to file's name.
The global results are automatically saved using the same name with the addiction of the postfix "_globals".

It's recommended to save on a different folder from the code, e.g.: a folder named result.

The fname value should be changed for any new executions or the local file will be overwrited!



In [34]:

    
fname = "/Users/sandrofsousa/Downloads/valid/result"

output = pd.DataFrame(output, columns=names)
output.to_csv("%s_local.csv" % fname, sep=",", index=False)
with open("%s_global.txt" % fname, "w") as f:
    f.write('Global dissimilarity: ' + str(diss_global))
    f.write('\nGlobal entropy: ' + str(entro_global))
    f.write('\nGlobal Index H: ' + str(idxh_global))
    f.write('\nGlobal isolation/exposure: \n')
    f.write(str(expo_global))



In [85]:

    
# code to save data as a continuous string - Marcus request for R use

# names2 = ['dissimil', 'entropy', 'indexh']

# for i in range(cc.n_group):
#         for j in range(cc.n_group):
#             if i == j:
#                 names2.append('iso_' + str(i) + str(j))
#             else:
#                 names2.append('exp_' + str(i) + str(j))

# values = [diss_global, entro_global, idxh_global]
# for i in expo_global: values.append(i)

# file2 = "/Users/sandrofsousa/Downloads/"
# with open("%s_global.csv" % file2, "w") as f:
#     f.write(', '.join(names2) + '\n')
#     f.write(', '.join(str(i) for i in values))