iPosition Monte Carlo Simulation

This notebook contains various monte carlo simulations for iPosition data. In particular, a few primary methods of simulation are used. Naive 2D simulation, histogram data-driven simulation, and dirichlet distribution simulation are used to determine chance levels. The "actual coordinates" are either from real coordinates or from random coordinates.

First we need to import the pipeline. You'll need to change this directory to wherever it is stored on your machine.


In [1]:
from cogrecon.core.full_pipeline import full_pipeline, get_header_labels
from cogrecon.core.data_structures import TrialData, ParticipantData, AnalysisConfiguration

Naive 2D Simulation

This section contains the Naive 2D simulations using either truly random values with random target points or truly random values with actual target points.

First, we define some global variables about our simulation.


In [2]:
from sklearn.externals import joblib

sim_iterations = 1000  # For convenience, the number of iterations each simulation configuration should run

# Define the dimensions, number of items, and iterations for each test
root_dir = r'C:\Users\Kevin\Google Drive\iPyNotebooks\iPosition'
# kde_model = joblib.load(root_dir + r'\pat_kde.pkl')
test_configs = [
    {'dims': 2, 'items': 2, 'iterations': sim_iterations, 'random_source': 'naive2d'},  # 'kde', 'model': kde_model},
    {'dims': 2, 'items': 3, 'iterations': sim_iterations, 'random_source': 'naive2d'},  # 'kde', 'model': kde_model},
    {'dims': 2, 'items': 4, 'iterations': sim_iterations, 'random_source': 'naive2d'},  # 'kde', 'model': kde_model},
    {'dims': 2, 'items': 5, 'iterations': sim_iterations, 'random_source': 'naive2d'},  # 'kde', 'model': kde_model},
    {'dims': 2, 'items': 6, 'iterations': sim_iterations, 'random_source': 'naive2d'},  # 'kde', 'model': kde_model},
    {'dims': 2, 'items': 7, 'iterations': sim_iterations, 'random_source': 'naive2d'}  # 'kde', 'model': kde_model}
]

remove_columns = [4, 18, 40, 41, 42, 43]  # Some columns of our output may not average or standard-deviation easily, so we remove those

save_filename = 'naive_2d_monte_carlo.p'  # The filename to save the output as we go

In [3]:
import numpy as np
import numpy.random as rand
import logging
import time
import os
import pickle

# Disable some outputs that we don't need given our circumstances
logger = logging.getLogger()
logger.disabled = True
np.seterr(invalid='ignore')


Out[3]:
{'divide': 'warn', 'invalid': 'warn', 'over': 'warn', 'under': 'ignore'}

In [4]:
# Helper for getting the appropriate headers for columns we keep
def get_output_labels():
    headers = get_header_labels()
    headers = np.delete(headers, remove_columns)
    return headers

# Helper for printing our variables as we run
def print_read_friendly(o):
    headers = get_output_labels()
    
    row_format ="{0:55}: {1:15}"
    for h, oo in zip(headers, o):
        print(row_format.format(h, oo))

# Helper for converting our outputs to an easy-to-save format
def get_save_data(_test_configs, _output_labels, _mean_outputs, _std_outputs, _times):
    save_data = {
        'test_configs': _test_configs,
        'output_labels': _output_labels,
        'mean_outputs': _mean_outputs,
        'std_outputs': _std_outputs,
        'times': _times
            }
    return save_data
    
# Helper for saving our data
def checkpoint_data(save_filename, data):
    pickle.dump(data, open(save_filename, 'wb'))

def generate_bounded_samples(kde, n_samples, x_range=(0, 1), y_range=(0, 1)):
    y_sample, x_sample = np.transpose(kde.sample(n_samples=n_samples))
    count = np.inf
    while count != 0:
        count = 0
        for idx, (x, y) in enumerate(zip(x_sample, y_sample)):
            if (not (x_range[0] <= x <= x_range[1])) or (not (y_range[0] <= y <= y_range[1])):
                count += 1
                yy, xx = np.transpose(kde.sample(n_samples=1))
                x_sample[idx] = xx[0]
                y_sample[idx] = yy[0]
    return x_sample, y_sample    
    
# Helper for getting random data
def get_random_data(n, dims, source='naive2d', model=None):
    if source == 'naive2d':
        actual = np.array([np.array([rand.random() for _ in range(dims)]) for _ in range(n)])
        data = np.array([np.array([rand.random() for _ in range(dims)]) for _ in range(n)])
    elif source == 'kde':
        if model is None:
            raise ValueError("No model provided for KDE.")
        if dims != 2:
            raise ValueError("Dimension must be 2D for KDE.")
        x, y = generate_bounded_samples(model, n)
        actual = np.array([np.array([xx, yy]) for xx, yy in zip(x, y)])
        data = np.array([np.array([rand.random() for _ in range(dims)]) for _ in range(n)])
    return actual.tolist(), data.tolist()

In [5]:
np.seterr(divide='ignore')

# Lists to store our main outputs
mean_outputs = []
std_outputs = []
times = []

# Iterate through our configurations
for config in test_configs:
    # Get config parameters
    dims = config['dims']
    items = config['items']
    iterations = config['iterations']
    random_source = config['random_source']
    if 'model' in config:
        model = config['model']
    else:
        model = None
    
    # List to store each iteration output - for large iterations, this is the list that can balloon up
    outputs = []
    
    # Record start runtime
    start_time = time.time()
    
    # Iterate the number of times requested
    for _ in range(iterations):
        # Generate random data
        actual, data = get_random_data(items, dims, random_source, model)
        
        # Run the pipeline
        output = full_pipeline(ParticipantData([TrialData(actual, data)]), AnalysisConfiguration(), visualize=False)[0]

        # Delete the removal columns and append the output
        output = np.delete(output, remove_columns, axis=0)
        outputs.append(output)
    
    # Save the runtime, mean of outputs, and standard deviation of outputs (converting to float for that to avoid errors)
    duration = time.time() - start_time
    try:
        avgs = np.nanmean(outputs, axis=0)
    except ZeroDivisionError:
        print(np.array(outputs).tolist())
        break
    stds = np.nanstd([[float(x) for x in inner] for inner in outputs], axis=0)
    
    mean_outputs.append(avgs)
    std_outputs.append(stds)
    times.append(duration)
    
    # Checkpoint/save the data to file
    checkpoint_data(save_filename, get_save_data(test_configs, get_output_labels(), mean_outputs, std_outputs, times))

    # Print a report on this configuration for the user
    print('{0} iterations run in {1} seconds ({2} average) on {3}.'.format(sim_iterations, duration, duration/sim_iterations, config))
    print('_'*100)
    print_read_friendly(avgs)
    print('_'*100)
    print('_'*100)


C:\Program Files\Anaconda3\envs\iposition\lib\site-packages\numpy\lib\nanfunctions.py:1423: RuntimeWarning: Degrees of freedom <= 0 for slice.
  keepdims=keepdims)
1000 iterations run in 6.17199993134 seconds (0.00617199993134 average) on {'dims': 2, 'random_source': 'naive2d', 'iterations': 1000, 'items': 2}.
____________________________________________________________________________________________________
Original Misplacement                                  :  0.521617249092
Original Swap                                          :           0.261
Original Edge Resizing                                 : 0.0940437123633
Original Edge Distortion                               :           1.006
Pre-Processed Accurate Placements                      :             2.0
Pre-Processed Inaccurate Placements                    :             0.0
Pre-Processed Accuracy Threshold                       :   0.71740326488
Deanonymized Accurate Placements                       :             2.0
Deanonymized Inaccurate Placements                     :             0.0
Deanonymized Accuracy Threshold                        :  0.635353265272
Raw Deanonymized Misplacement                          :  0.437115542543
Post-Deanonymized Misplacement                         :  0.437115542543
Transformation Auto-Exclusion                          :           0.629
Number of Points Excluded From Geometric Transform     :             0.0
Rotation Theta                                         :             nan
Scaling                                                :             nan
Translation Magnitude                                  :             nan
TranslationY                                           :             nan
Geometric Distance Threshold                           :  0.635353265272
Post-Transform Misplacement                            :  0.376522353516
Number of Components                                   :           1.488
Accurate Single-Item Placements                        :           0.963
Inaccurate Single-Item Placements                      :           0.013
True Swaps                                             :           0.501
Partial Swaps                                          :           0.011
Cycle Swaps                                            :             0.0
Partial Cycle Swaps                                    :             0.0
Misassignment                                          :           1.024
Accurate Misassignment                                 :           1.009
Inaccurate Misassignment                               :           0.015
Swap Distance Threshold                                :  0.545266816696
True Swap Data Distance                                :             nan
True Swap Actual Distance                              :             nan
Partial Swap Data Distance                             :             nan
Partial Swap Actual Distance                           :             nan
Cycle Swap Data Distance                               :             nan
Cycle Swap Actual Distance                             :             nan
Partial Cycle Swap Data Distance                       :             nan
____________________________________________________________________________________________________
____________________________________________________________________________________________________
1000 iterations run in 6.77900004387 seconds (0.00677900004387 average) on {'dims': 2, 'random_source': 'naive2d', 'iterations': 1000, 'items': 3}.
____________________________________________________________________________________________________
Original Misplacement                                  :  0.515639194224
Original Swap                                          :           0.224
Original Edge Resizing                                 :  0.139782419186
Original Edge Distortion                               :           0.957
Pre-Processed Accurate Placements                      :            2.41
Pre-Processed Inaccurate Placements                    :            0.59
Pre-Processed Accuracy Threshold                       :  0.715056535894
Deanonymized Accurate Placements                       :           2.365
Deanonymized Inaccurate Placements                     :           0.635
Deanonymized Accuracy Threshold                        :  0.578895272067
Raw Deanonymized Misplacement                          :  0.392063839952
Post-Deanonymized Misplacement                         :  0.392063839952
Transformation Auto-Exclusion                          :           0.589
Number of Points Excluded From Geometric Transform     :           0.635
Rotation Theta                                         :             nan
Scaling                                                :             nan
Translation Magnitude                                  :             nan
TranslationY                                           :             nan
Geometric Distance Threshold                           :  0.578895272067
Post-Transform Misplacement                            :  0.346003052784
Number of Components                                   :           1.866
Accurate Single-Item Placements                        :           0.813
Inaccurate Single-Item Placements                      :           0.244
True Swaps                                             :           0.231
Partial Swaps                                          :           0.253
Cycle Swaps                                            :           0.094
Partial Cycle Swaps                                    :           0.231
Misassignment                                          :           1.943
Accurate Misassignment                                 :           1.459
Inaccurate Misassignment                               :           0.484
Swap Distance Threshold                                :  0.520299160435
True Swap Data Distance                                :             nan
True Swap Actual Distance                              :             nan
Partial Swap Data Distance                             :             nan
Partial Swap Actual Distance                           :             nan
Cycle Swap Data Distance                               :             nan
Cycle Swap Actual Distance                             :             nan
Partial Cycle Swap Data Distance                       :             nan
____________________________________________________________________________________________________
____________________________________________________________________________________________________
1000 iterations run in 7.42900013924 seconds (0.00742900013924 average) on {'dims': 2, 'random_source': 'naive2d', 'iterations': 1000, 'items': 4}.
____________________________________________________________________________________________________
Original Misplacement                                  :  0.519717581664
Original Swap                                          :  0.256666666667
Original Edge Resizing                                 :  0.165381193056
Original Edge Distortion                               :   1.00433333333
Pre-Processed Accurate Placements                      :           3.116
Pre-Processed Inaccurate Placements                    :           0.884
Pre-Processed Accuracy Threshold                       :  0.716558121909
Deanonymized Accurate Placements                       :           3.074
Deanonymized Inaccurate Placements                     :           0.926
Deanonymized Accuracy Threshold                        :  0.522998593706
Raw Deanonymized Misplacement                          :  0.353347878836
Post-Deanonymized Misplacement                         :  0.353347878836
Transformation Auto-Exclusion                          :           0.428
Number of Points Excluded From Geometric Transform     :           0.926
Rotation Theta                                         :             nan
Scaling                                                :             nan
Translation Magnitude                                  :             nan
TranslationY                                           :             nan
Geometric Distance Threshold                           :  0.522998593706
Post-Transform Misplacement                            :  0.308329761178
Number of Components                                   :           2.102
Accurate Single-Item Placements                        :           0.736
Inaccurate Single-Item Placements                      :           0.264
True Swaps                                             :           0.281
Partial Swaps                                          :           0.258
Cycle Swaps                                            :           0.138
Partial Cycle Swaps                                    :           0.425
Misassignment                                          :             3.0
Accurate Misassignment                                 :           2.315
Inaccurate Misassignment                               :           0.685
Swap Distance Threshold                                :  0.460527020805
True Swap Data Distance                                :             nan
True Swap Actual Distance                              :             nan
Partial Swap Data Distance                             :             nan
Partial Swap Actual Distance                           :             nan
Cycle Swap Data Distance                               :             nan
Cycle Swap Actual Distance                             :             nan
Partial Cycle Swap Data Distance                       :             nan
____________________________________________________________________________________________________
____________________________________________________________________________________________________
1000 iterations run in 8.34400010109 seconds (0.00834400010109 average) on {'dims': 2, 'random_source': 'naive2d', 'iterations': 1000, 'items': 5}.
____________________________________________________________________________________________________
Original Misplacement                                  :  0.519385256928
Original Swap                                          :          0.2413
Original Edge Resizing                                 :  0.188282557083
Original Edge Distortion                               :          0.9898
Pre-Processed Accurate Placements                      :            3.85
Pre-Processed Inaccurate Placements                    :            1.15
Pre-Processed Accuracy Threshold                       :  0.704434920488
Deanonymized Accurate Placements                       :           3.805
Deanonymized Inaccurate Placements                     :           1.195
Deanonymized Accuracy Threshold                        :  0.484544940284
Raw Deanonymized Misplacement                          :  0.333925883989
Post-Deanonymized Misplacement                         :  0.333925883989
Transformation Auto-Exclusion                          :           0.352
Number of Points Excluded From Geometric Transform     :           1.195
Rotation Theta                                         :             nan
Scaling                                                :             nan
Translation Magnitude                                  :             nan
TranslationY                                           :             nan
Geometric Distance Threshold                           :  0.484544940284
Post-Transform Misplacement                            :  0.290443691606
Number of Components                                   :           2.297
Accurate Single-Item Placements                        :           0.772
Inaccurate Single-Item Placements                      :           0.241
True Swaps                                             :           0.297
Partial Swaps                                          :           0.219
Cycle Swaps                                            :           0.147
Partial Cycle Swaps                                    :           0.621
Misassignment                                          :           3.987
Accurate Misassignment                                 :           3.039
Inaccurate Misassignment                               :           0.948
Swap Distance Threshold                                :  0.423739368366
True Swap Data Distance                                :             nan
True Swap Actual Distance                              :             nan
Partial Swap Data Distance                             :             nan
Partial Swap Actual Distance                           :             nan
Cycle Swap Data Distance                               :             nan
Cycle Swap Actual Distance                             :             nan
Partial Cycle Swap Data Distance                       :             nan
____________________________________________________________________________________________________
____________________________________________________________________________________________________
1000 iterations run in 9.42499995232 seconds (0.00942499995232 average) on {'dims': 2, 'random_source': 'naive2d', 'iterations': 1000, 'items': 6}.
____________________________________________________________________________________________________
Original Misplacement                                  :  0.522951395078
Original Swap                                          :  0.252533333333
Original Edge Resizing                                 :  0.203979206219
Original Edge Distortion                               :   1.00606666667
Pre-Processed Accurate Placements                      :           4.504
Pre-Processed Inaccurate Placements                    :           1.496
Pre-Processed Accuracy Threshold                       :  0.698669586206
Deanonymized Accurate Placements                       :           4.507
Deanonymized Inaccurate Placements                     :           1.493
Deanonymized Accuracy Threshold                        :  0.442475698717
Raw Deanonymized Misplacement                          :  0.309732269964
Post-Deanonymized Misplacement                         :  0.309732269964
Transformation Auto-Exclusion                          :           0.301
Number of Points Excluded From Geometric Transform     :           1.493
Rotation Theta                                         :             nan
Scaling                                                :             nan
Translation Magnitude                                  :             nan
TranslationY                                           :             nan
Geometric Distance Threshold                           :  0.442475698717
Post-Transform Misplacement                            :  0.268600686122
Number of Components                                   :            2.45
Accurate Single-Item Placements                        :           0.776
Inaccurate Single-Item Placements                      :           0.229
True Swaps                                             :           0.276
Partial Swaps                                          :           0.222
Cycle Swaps                                            :           0.175
Partial Cycle Swaps                                    :           0.772
Misassignment                                          :           4.995
Accurate Misassignment                                 :           3.769
Inaccurate Misassignment                               :           1.226
Swap Distance Threshold                                :  0.386872316421
True Swap Data Distance                                :             nan
True Swap Actual Distance                              :             nan
Partial Swap Data Distance                             :             nan
Partial Swap Actual Distance                           :             nan
Cycle Swap Data Distance                               :             nan
Cycle Swap Actual Distance                             :             nan
Partial Cycle Swap Data Distance                       :             nan
____________________________________________________________________________________________________
____________________________________________________________________________________________________
1000 iterations run in 10.5009999275 seconds (0.0105009999275 average) on {'dims': 2, 'random_source': 'naive2d', 'iterations': 1000, 'items': 7}.
____________________________________________________________________________________________________
Original Misplacement                                  :  0.520092110994
Original Swap                                          :  0.252095238095
Original Edge Resizing                                 :   0.21176356452
Original Edge Distortion                               :  0.999476190476
Pre-Processed Accurate Placements                      :           5.188
Pre-Processed Inaccurate Placements                    :           1.812
Pre-Processed Accuracy Threshold                       :   0.68567570481
Deanonymized Accurate Placements                       :           5.272
Deanonymized Inaccurate Placements                     :           1.728
Deanonymized Accuracy Threshold                        :  0.417763607624
Raw Deanonymized Misplacement                          :  0.294403545998
Post-Deanonymized Misplacement                         :  0.294403545998
Transformation Auto-Exclusion                          :            0.28
Number of Points Excluded From Geometric Transform     :           1.728
Rotation Theta                                         :             nan
Scaling                                                :             nan
Translation Magnitude                                  :             nan
TranslationY                                           :             nan
Geometric Distance Threshold                           :  0.417763607624
Post-Transform Misplacement                            :  0.256151561386
Number of Components                                   :           2.587
Accurate Single-Item Placements                        :           0.741
Inaccurate Single-Item Placements                      :           0.234
True Swaps                                             :           0.257
Partial Swaps                                          :           0.244
Cycle Swaps                                            :           0.233
Partial Cycle Swaps                                    :           0.878
Misassignment                                          :           6.025
Accurate Misassignment                                 :            4.53
Inaccurate Misassignment                               :           1.495
Swap Distance Threshold                                :  0.364572722472
True Swap Data Distance                                :             nan
True Swap Actual Distance                              :             nan
Partial Swap Data Distance                             :             nan
Partial Swap Actual Distance                           :             nan
Cycle Swap Data Distance                               :             nan
Cycle Swap Actual Distance                             :             nan
Partial Cycle Swap Data Distance                       :             nan
____________________________________________________________________________________________________
____________________________________________________________________________________________________

Load the data to confirm it saved properly.


In [ ]:
load_data = pickle.load(open(save_filename, "rb"))
print(load_data)

In [ ]: