Pysegreg run - Distance based

Instructions

For fast processing, you can just change the following variables before running:

  • path/name at Input file cell (select the file you want to use)
  • bandwidth and weigth method at compute population intensity cell
  • file name in the variable fname at section Save results to a local file (the file you want to save results)

make sure you don't use a name already used or the file will be replaced

With the previous steps in mind, just click on Cell menu and select Run All


In [1]:
# Imports
import numpy as np
np.seterr(all='ignore')
import pandas as pd
from decimal import Decimal
import time

# Import python script with Pysegreg functions
from segregationMetrics import Segreg

# Instantiate segreg as cc
cc = Segreg()

Input file

Attention to the new data structure for input !!!

Change your input file with path/name in the cell below to be processed.

Data Format
ID | X | Y | group 1 | group 2 | group n


In [2]:
cc.readAttributesFile('/Users/sandrofsousa/Downloads/valid/Segreg sample.csv')


Out[2]:
matrix([[  3.33245530e+05,   7.39477232e+06,   7.70000000e-01, ...,
           3.65000000e+00,   1.49000000e+00,   3.45000000e+00],
        [  3.33657950e+05,   7.39531053e+06,   5.10000000e-01, ...,
           8.41000000e+00,   2.12000000e+00,   1.14000000e+00],
        [  3.33381780e+05,   7.39420259e+06,   1.42000000e+00, ...,
           2.20800000e+01,   2.88000000e+01,   9.11000000e+00],
        ..., 
        [  3.00003210e+05,   7.39509971e+06,   2.00000000e-01, ...,
           7.60000000e-01,   2.49000000e+00,   1.69000000e+00],
        [  3.04217510e+05,   7.40542844e+06,   5.00000000e-02, ...,
           1.40000000e-01,   6.20000000e-01,   2.20000000e-01],
        [  2.97174290e+05,   7.41325052e+06,   0.00000000e+00, ...,
           1.00000000e-02,   1.30000000e-01,   7.00000000e-02]])

Measures

Compute Population Intensity

For non spatial result, please comment the function call at: "cc.locality= ..."

  • to comment a code use # in the begining of the line

Distance matrix is calculated at this step. Change the parameters for the population
intensity according to your needs. Parameters are:

  • bandwidth - is set to be 5000m by default, you can change it here
  • weightmethod - 1 for gaussian, 2 for bi-square and empty for moving window

In [3]:
start_time = time.time()

cc.locality = cc.cal_localityMatrix(bandwidth=700, weightmethod=1)

print("--- %s seconds for processing ---" % (time.time() - start_time))


--- 0.30095577239990234 seconds for processing ---

For validation only
Remove the comment (#) if you want to see the values and validate


In [4]:
# np.set_printoptions(threshold=np.inf)
# print('Location (coordinates from data):\n', cc.location)
# print()
# print('Population intensity for all groups:\n', cc.locality)

'''To select locality for a specific line (validation), use the index in[x,:]'''
# where x is the number of the desired line

# cc.locality[5,:]


Out[4]:
'To select locality for a specific line (validation), use the index in[x,:]'

Compute local Dissimilarity


In [5]:
diss_local = cc.cal_localDissimilarity()
diss_local = np.asmatrix(diss_local).transpose()

Compute global Dissimilarity


In [6]:
diss_global = cc.cal_globalDissimilarity()

Compute local Exposure/Isolation
expo is a matrix of n_group * n_group therefore, exposure (m,n) = rs[m,n]
the columns are exporsure m1 to n1, to n2... n5, m2 to n1....n5

  • m,m = isolation index of group m
  • m,n = expouse index of group m to n

Result of all combinations of local groups expousure/isolation
To select a specific line of m to n, use the index [x]
Each value is a result of the combinations m,n
e.g.: g1xg1, g1xg2, g2,g1, g2xg2 = isolation, expousure, // , isolation


In [7]:
expo_local = cc.cal_localExposure()

Compute global Exposure/Isolation


In [18]:
expo_global = cc.cal_globalExposure()

Compute local Entropy


In [19]:
entro_local = cc.cal_localEntropy()

Compute global Entropy


In [20]:
entro_global = cc.cal_globalEntropy()

Compute local Index H


In [21]:
idxh_local = cc.cal_localIndexH()

Compute global Index H


In [22]:
idxh_global = cc.cal_globalIndexH()

Results

Prepare data for saving on a local file


In [23]:
# Concatenate local values from measures
if len(cc.locality) == 0:
    results = np.concatenate((expo_local, diss_local, entro_local, idxh_local), axis=1)
else:
    results = np.concatenate((cc.locality, expo_local, diss_local, entro_local, idxh_local), axis=1)

# Concatenate the results with original data
output = np.concatenate((cc.tract_id, cc.attributeMatrix, results),axis = 1)

In [24]:
names = ['id','x','y']

for i in range(cc.n_group):
    names.append('group_'+str(i))

if len(cc.locality) == 0:    
    for i in range(cc.n_group):
        for j in range(cc.n_group):
            if i == j:
                names.append('iso_' + str(i) + str(j))
            else:
                names.append('exp_' + str(i) + str(j))
            
    names.append('dissimil')
    names.append('entropy')
    names.append('indexh')
    
else:
    for i in range(cc.n_group):
        names.append('intens_'+str(i))
        
    for i in range(cc.n_group):
        for j in range(cc.n_group):
            if i == j:
                names.append('iso_' + str(i) + str(j))
            else:
                names.append('exp_' + str(i) + str(j))
            
    names.append('dissimil')
    names.append('entropy')
    names.append('indexh')

Save Local and global results to a file

The paramenter fname corresponds to the folder/filename, change it as you want.
To save on a diferent folder, use the "/" to pass the directory.
The local results will be saved using the name defined and adding the "_local" postfix to file's name.
The global results are automatically saved using the same name with the addiction of the postfix "_globals".

It's recommended to save on a different folder from the code, e.g.: a folder named result.

The fname value should be changed for any new executions or the local file will be overwrited!


In [34]:
fname = "/Users/sandrofsousa/Downloads/valid/result"

output = pd.DataFrame(output, columns=names)
output.to_csv("%s_local.csv" % fname, sep=",", index=False)
with open("%s_global.txt" % fname, "w") as f:
    f.write('Global dissimilarity: ' + str(diss_global))
    f.write('\nGlobal entropy: ' + str(entro_global))
    f.write('\nGlobal Index H: ' + str(idxh_global))
    f.write('\nGlobal isolation/exposure: \n')
    f.write(str(expo_global))

In [85]:
# code to save data as a continuous string - Marcus request for R use

# names2 = ['dissimil', 'entropy', 'indexh']

# for i in range(cc.n_group):
#         for j in range(cc.n_group):
#             if i == j:
#                 names2.append('iso_' + str(i) + str(j))
#             else:
#                 names2.append('exp_' + str(i) + str(j))

# values = [diss_global, entro_global, idxh_global]
# for i in expo_global: values.append(i)

# file2 = "/Users/sandrofsousa/Downloads/"
# with open("%s_global.csv" % file2, "w") as f:
#     f.write(', '.join(names2) + '\n')
#     f.write(', '.join(str(i) for i in values))