Illustrating the 'floor' option in a discrete distribution

When selecting a discrete distribution, either one or two dimensional, you can configure a floor option, which defaults to no. The examples below are for a 2D distribution, but it is completely similar for a 1D distribution. The example also shows how you can pass a matrix (an array of arrays) as extra data to the simulation. This is converted to a CSV file, the path of which is communicated to the simulation program.



In [1]:

    
# First, we'll load some modules that we're going to need
%matplotlib inline
import matplotlib.pyplot as plt
import pysimpactcyan
import pandas as pd



In [2]:

    
simpact = pysimpactcyan.PySimpactCyan()









    



Setting data directory to /usr/local/share/simpact-cyan/



In [3]:

    
# In this example we're going to (ab)use the geographic location of a person
# to show how to use a 2D discrete distribution. Upon initialization, the
# location of each person is drawn from the discrete distribution.
#
# We don't need many events, just one to make sure that the population is
# initialized. We'll create many men and women in the simulation, but turn
# off relationship formation (using the 'eyecap' setting) to avoid scheduling
# a large amount of events that we don't need anyway.
cfg = { }
cfg["population.eyecap.fraction"] = 0
cfg["population.nummen"] = 100000
cfg["population.numwomen"] = 100000
cfg["population.maxevents"] = 1

# This matrix will be communicated tot the simulation below, and will be the
# basis of our 2D distribution.
probabilities = [
    [ 1, 2, 3, 4],
    [ 5, 6, 7, 0]
]

# Here, we specify that we're going to use a discrete distribution for the
# location of each person
cfg["person.geo.dist2d.type"] = "discrete"
# By starting a field with "data:", we can refer to a CSV file that's 
# generated by data passed to the simulation using the 'dataFiles' argument
# (see below). Here, 'probs' is the name of the data file
cfg["person.geo.dist2d.discrete.densfile"] = "data:probs"
cfg["person.geo.dist2d.discrete.width"] = 4
cfg["person.geo.dist2d.discrete.height"] = 2

# We're going to assign the name 'props' to the probability matrix that was
# specified above, since that's the name we've already used in the config
# settings
data = { "probs": probabilities }

# Finally, we start the simulation, read the person log and plot the location
# of each person as a 2D histogram. This has the structure of the data file
# that was passed.
ret = simpact.run(cfg, "/tmp/simptest", dataFiles=data)
persons = pd.read_csv(ret["logpersons"])
plt.hist2d(persons["XCoord"], persons["YCoord"], bins=20);









    



Using identifier 'simpact-cyan-2015-09-29-12-20-49_21737_JLg70DkJ-'
Results will be stored in directory '/tmp/simptest'
Running simpact executable...
Done.

# read seed from /dev/urandom
# Using seed 321855582
# Performing extra check on read configuration parameters
# WARNING: ignoring consistency check for config key population.agedistfile (config value is '/usr/local/share/simpact-cyan/sa_2003.csv')
# mNRM: using advanced algorithm
# Release version
# Simpact version is: 0.20.0testing
# Number of events executed is 1
# Started with 200000 people, ending with 200000 (difference is 0)



In [4]:

    
# Here, the location of a few persons is displayed. As you can see, the location
# can be anywhere inside the specified region (but with the specified probabilities)
p2 = persons[persons["ID"] < 10]
p2[["ID","XCoord","YCoord"]]



In [5]:

    
# The default setting of the 'floor' parameter was 'no'. To see what the effect of
# this parameter is, let's now set it to 'yes'
cfg["person.geo.dist2d.discrete.floor"] = "yes"

ret = simpact.run(cfg, "/tmp/simptest", dataFiles=data)
persons = pd.read_csv(ret["logpersons"])

# When creating the 2D histogram again, you'll notice that only a few points are
# actually used. This is because the 'floor' setting causes only the coordinates
# of the corners to be possible.
plt.hist2d(persons["XCoord"], persons["YCoord"], bins=20);









    



Using identifier 'simpact-cyan-2015-09-29-12-20-51_21737_gbvQCmRj-'
Results will be stored in directory '/tmp/simptest'
Running simpact executable...
Done.

# read seed from /dev/urandom
# Using seed 1210235744
# Performing extra check on read configuration parameters
# WARNING: ignoring consistency check for config key population.agedistfile (config value is '/usr/local/share/simpact-cyan/sa_2003.csv')
# mNRM: using advanced algorithm
# Release version
# Simpact version is: 0.20.0testing
# Number of events executed is 1
# Started with 200000 people, ending with 200000 (difference is 0)



In [6]:

    
# The effect of this 'floor' parameter is also clear when showing the location of
# a few persons. These coordinates can no longer vary continuously, but are restricted
# to the corners of the bins.
p2 = persons[persons["ID"] < 10]
p2[["ID","XCoord","YCoord"]]



In [ ]:

	ID	XCoord	YCoord
0	1	2.201008	1.295633
1	2	2.395076	0.509166
2	3	0.035034	0.430343
3	4	2.388913	0.007306
4	5	1.843243	0.529983
5	6	1.378913	0.174881
6	7	3.724943	1.332569
7	8	2.663234	0.634857
8	9	3.117691	1.452634