ATLeS - Descriptive Statistics

This script is designed to provide a general purpose tool for producing descriptive statistics and visualizations for ATLES data. The intent is that this notebook will provide a basic framework for you to build on.

Instructions

Provide experiment details in the 'Parameters' section below, then execute notebook to generate stats.

General Information

Everytime an experiment is run ATLeS generates three files.

  1. date-time-experimentname.txt (log of tracking activity/issues)
  2. date-time-experimentname-setup.txt (details of experimental setup)
  3. date-time-experimentname-track.csv (track files; raw tracking data)

Broadly this notebook will:

  1. grab the relevant data sources (see above) and integrate them
  2. clean up the data a bit
  3. summarize the data a bit
  4. vizualize the data a bit

To do:

Function to check for duplicates, remove empty rows from df

Import Libraries


In [1]:
from pathlib import Path
import configparser
import numpy as np
import pandas as pd
import seaborn
import matplotlib.pyplot as plt
import pingouinparametrics as pp

# add src/ directory to path to import ATLeS code
import os
import sys
module_path = os.path.abspath(os.path.join('..', 'src'))
if module_path not in sys.path:
    sys.path.append(module_path)

# imported from ATLeS
from analysis.process import TrackProcessor
from analysis.plot import TrackPlotter

# displays plots in notebook output
%matplotlib inline

Parameters

Input experiment details here:


In [2]:
experimentname = 'ACTEST2'
trackdirectory = '../data/tracks'
experimenttype = 'extinction' # Set to 'extinction' or 'none'. Supplemental analyses are generated for extinction experiments.

Set analysis options here:


In [3]:
acquisitionlevel = .85  # Sets cut off level for excluding tracks based on poor tracking.
notriggerexclude = True  # If True, excludes tracks where the trigger was never triggered. If False, includes tracks where no trigger occurred

Globals


In [4]:
framelist = [] # Collects frames generated for eventual combination

Identify the Data Files

Finds track and settingsfiles within the trackdirectory that match the experiment names and creates lists of track and settings files.


In [5]:
trackfiles = list(Path(trackdirectory).glob(f'**/*{experimentname}*track.csv'))
settingsfiles = list(Path(trackdirectory).glob(f'**/*{experimentname}*setup.txt'))
    
print(f'{len(trackfiles)} track files were found with the name {experimentname}')
print(f'{len(settingsfiles)} settings files were found with the name {experimentname}\n')

if len(trackfiles) != len(settingsfiles):
    print('WARNING: Mismatched track and settings files.')


61 track files were found with the name ACTEST2
61 settings files were found with the name ACTEST2

Identify and Store Experimental Settings

The number of experimental phases varies across experiments. This block identifies the phases used for the current experiment and verfies that all tracks have the same phase information.

The settings may vary between tracks within an experiment. This block also identifies the settings for each track and writes them to a dictionary.


In [6]:
Config = configparser.ConfigParser()

settingsdic ={} # Dictionary used to store all settings information.
phaselist = [] # List of phases used to verify phases are consistent across tracks.


# reads and organizes information from each settings file
for file in settingsfiles:
    Config.read(file)
    
    # generate clean list of stimuli
    stiminfo = Config.get('experiment', 'stimulus') #gets stim info
    stiminfo = stiminfo.replace('(', ',').replace(')', '').replace(' ', '').split(',')[1:] #cleans stim list
    
    # generate clean list of phases
    phaselisttemp = Config.get('phases', 'phases_argstrings') # gets phase info
    phaselisttemp = phaselisttemp.replace('-p ', '').replace(' ', '').split(',')[:-1] #cleans phase list

    # compare each phase list with the list from the previous settings file
    if len(phaselist) == 0:
        phaselist = phaselisttemp
    elif phaselist != phaselisttemp:
        print('Warning: Inconsistent phases between settings files.')
    else:
        pass

    # counts phases and generates phase variable names
    phasenumber = len(phaselist)//2
    phasenames = []
    for i in range(phasenumber):
        p, t, s = 'phase', 'time', 'stim'
        phase = p+str(i+1)
        phasetime = phase + t
        phasestim = phase + s
        phasenames.extend((phasetime, phasestim))

    # gets settings info from filename (track/box)
    trackname = file.parts[-1].replace("-setup.txt", "")
    box = file.parts[-2]

    # gets settings info from setting file
    controller = Config.get('experiment', 'controller')
    trigger = Config.get('experiment', 'trigger')

    settings = [phaselisttemp, controller, trigger, stiminfo, box, str(file)]
    
    # puts all settings in dic keyed to trackname
    settingsdic[trackname] = settings

# creates settings dataframe from settingsdic
dfsettings = pd.DataFrame(settingsdic).transpose()
dfsettings.columns = ['phases', 'controller', 'trigger', 'stimulus', 'box', 'file']
dfsettings['track'] = dfsettings.index

# creates stimulus dataframe, splits up and names stims
dfstim = pd.DataFrame(dfsettings.stimulus.values.tolist(), index=dfsettings.index).fillna('-')

for col in range(dfstim.shape[1]):
    dfstim=dfstim.rename(columns = {col:('stim_setting' + str(col))})

framelist.append(dfsettings)
dfsettings.head(3)


Out[6]:
phases controller trigger stimulus box file track
20180219-162833-ACTEST2 [5, off, 5, on, 10, off] controllers.FixedRatioController(response_step=1) xpos < 0.50 [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0... bbox3 ../data/tracks/bbox3/20180219-162833-ACTEST2-s... 20180219-162833-ACTEST2
20180303-140323-ACTEST2 [5, off, 5, on, 10, off] controllers.FixedRatioController(response_step=1) xpos < 0.50 [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0... bbox3 ../data/tracks/bbox3/20180303-140323-ACTEST2-s... 20180303-140323-ACTEST2
20180213-162455-ACTEST2 [5, off, 5, on, 10, off] controllers.FixedRatioController(response_step=1) xpos > 0.50 [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0... bbox3 ../data/tracks/bbox3/20180213-162455-ACTEST2-s... 20180213-162455-ACTEST2

Identify Phasetimes and Create Phase Dataframe

This block extracts phase info from settings w. trackname and calculates phasetimes.

This code currently assummes all phase time are the same across tracks within the experiment. This will need to be rewritten if we want to start running analyses across multiple studies with different phase times.


In [7]:
phaseinfo = settingsdic.get(trackname)[0]
phaseinfo = [x for x in phaseinfo if any(c.isdigit() for c in x)]
phaseinfo = list(map(int, phaseinfo))
phaseinfo = [i * 60 for i in phaseinfo]
phaselen = len(phaseinfo)

phaset = []
for i in range(phaselen):
    times = sum(phaseinfo[0:i+1])
    phaset.append(times)

# moves 0 to the first entry of phaset (works, but find a cleaner way to do this)
a = 0
phaset[0:0] = [a]

phasedic = {}
for i in range(phaselen):
    phasedic[i+1] = [phaset[i], phaset[i+1]]


# splits up and names the phases
dfphase = pd.DataFrame(dfsettings.phases.values.tolist(), index=dfsettings.index).fillna('-')
dfphase.columns = phasenames

phasenum = len(dfphase.columns)//2

framelist.append(dfphase)

dfphase.head(3)


Out[7]:
phase1time phase1stim phase2time phase2stim phase3time phase3stim
20180219-162833-ACTEST2 5 off 5 on 10 off
20180303-140323-ACTEST2 5 off 5 on 10 off
20180213-162455-ACTEST2 5 off 5 on 10 off

Generate Basic Stats


In [8]:
dfstats = pd.DataFrame()

for track in trackfiles:
    # gets track from file name
    trackname = track.parts[-1].replace("-track.csv", "")
    
    # gets stats from TrackProcessor (ATLeS analysis class)
    processor = TrackProcessor(str(track), normalize_x_with_trigger='xpos < 0.50')
    tempstatsdic = processor.get_stats(include_phases=True) # gets stats from track object
    
    # flattens dictionary into dataframe, from https://stackoverflow.com/questions/13575090/
    dftemp = pd.DataFrame.from_dict({(i,j): tempstatsdic[i][j] for i in tempstatsdic.keys() for j in tempstatsdic[i].keys()}, orient='index')
    
    #transposes dataframe and adds track as index
    dftemp = dftemp.transpose()
    dftemp['track'] = trackname 
    dftemp.set_index('track', inplace=True)
    
    dfstats = dfstats.append(dftemp, sort=True)

if 'phase 0' in dfstats.columns:
    dfstats.rename({'phase 0': 'p1', 'phase 1': 'p2', 'phase 2': 'p3'}, axis='columns', inplace = True)

dfstats.columns = dfstats.columns.map('|'.join)
    
framelist.append(dfstats)    
    
dfstats.head(3)


Out[8]:
all|#Datapoints all|#Freezes all|#Triggers all|#Valid all|%Valid datapoints all|Avg. normed x coordinate all|Avg. speed (?/sec) all|Avg. time per freeze (sec) all|Avg. time per trigger (sec) all|Avg. x coordinate ... phase 3|Avg. x speed (?/sec) phase 3|Avg. y coordinate phase 3|Avg. y speed (?/sec) phase 3|Freeze frequency (per min) phase 3|Total distance traveled (?) phase 3|Total time (sec) phase 3|Total time frozen (sec) phase 3|Total time triggered (sec) phase 3|Trigger frequency (per min) phase 3|Valid time (sec)
track
20180303-140323-ACTEST2 11989.0 4.0 162.0 11978.0 0.999082 0.453400 0.211504 39.397750 2.606121 0.546600 ... 0.113900 0.240247 0.053649 0.200015 81.223056 599.9547 146.8919 202.1632 5.600423 599.6567
20180219-162833-ACTEST2 11992.0 39.0 77.0 11974.0 0.998499 0.336595 0.090913 5.821121 2.409019 0.663405 ... 0.048981 0.137688 0.023124 2.400172 35.668226 599.9571 119.4581 90.5968 1.600114 599.7578
20180126-162646-ACTEST2 11992.0 13.0 68.0 11982.0 0.999166 0.356098 0.057592 25.261700 6.607431 0.356098 ... 0.037321 0.451903 0.029200 0.900067 31.472323 599.9556 49.3343 300.1005 2.800207 599.9556

3 rows × 80 columns

Generate Extinction Stats


In [9]:
if experimenttype == 'extinction':

    dfextstats = pd.DataFrame()

    for track in trackfiles:
        # gets track from file name
        trackname = track.parts[-1].replace("-track.csv", "")

        # gets advances stats from TrackProcessor (ATLeS analysis class)
        processor = TrackProcessor(str(track)) # passes track to track processor and returns track object
        tempstatsdic = processor.get_exp_stats('extinction') # gets stats from track object

        dftemp3 = pd.DataFrame(tempstatsdic, index=[0])

        dftemp3['track'] = trackname 
        dftemp3.set_index('track', inplace=True)

        dfextstats = dfextstats.append(dftemp3, sort=True)
    
    framelist.append(dfextstats)
    

else:
    print('Extinction experiment not selected in Parameters section.')


dfextstats.head(3)


Out[9]:
phase1-tme to 1st trigger frm phs start phase1-tme to 1st trigger_plus1 frm phs start phase1-tme to 1st trigger_plus2 frm phs start phase1-tme to 1st trigger_plus3 frm phs start phase1-tme to 1st trigger_plus4 frm phs start phase2-tme to 1st trigger frm phs start phase2-tme to 1st trigger frm prev trigger phase2-tme to 1st trigger_plus1 frm phs start phase2-tme to 1st trigger_plus1 frm prev trigger phase2-tme to 1st trigger_plus2 frm phs start ... phase3-tme to 1st trigger frm phs start phase3-tme to 1st trigger frm prev trigger phase3-tme to 1st trigger_plus1 frm phs start phase3-tme to 1st trigger_plus1 frm prev trigger phase3-tme to 1st trigger_plus2 frm phs start phase3-tme to 1st trigger_plus2 frm prev trigger phase3-tme to 1st trigger_plus3 frm phs start phase3-tme to 1st trigger_plus3 frm prev trigger phase3-tme to 1st trigger_plus4 frm phs start phase3-tme to 1st trigger_plus4 frm prev trigger
track
20180303-140323-ACTEST2 0.0048 0.0048 1.2063 1.2063 10.1036 206.9687 223.7824 207.3683 224.1820 207.8689 ... 0.0399 0.1000 0.0399 0.1000 0.0399 0.1 NaN NaN NaN NaN
20180219-162833-ACTEST2 0.0072 0.0072 1.6063 1.7044 1.8050 84.3770 370.1737 84.5793 370.3760 86.5792 ... 0.0454 0.1041 0.0454 0.1041 NaN NaN NaN NaN NaN NaN
20180126-162646-ACTEST2 0.0042 0.0042 0.7058 0.7058 0.8039 148.4718 438.3684 149.0718 438.9684 149.5718 ... 0.0393 0.0999 0.0393 0.0999 NaN NaN NaN NaN NaN NaN

3 rows × 25 columns

Combine Dataframes

Combines settings, stim, phase, and with dataframe of basic descriptive stats.


In [10]:
df = pd.concat(framelist, axis=1, sort=False) # combines all frames
df.dropna(axis=0, how='all', inplace=True) # drops any rows where all values are missing
df.head(3)


Out[10]:
phases controller trigger stimulus box file track phase1time phase1stim phase2time ... phase3-tme to 1st trigger frm phs start phase3-tme to 1st trigger frm prev trigger phase3-tme to 1st trigger_plus1 frm phs start phase3-tme to 1st trigger_plus1 frm prev trigger phase3-tme to 1st trigger_plus2 frm phs start phase3-tme to 1st trigger_plus2 frm prev trigger phase3-tme to 1st trigger_plus3 frm phs start phase3-tme to 1st trigger_plus3 frm prev trigger phase3-tme to 1st trigger_plus4 frm phs start phase3-tme to 1st trigger_plus4 frm prev trigger
20180219-162833-ACTEST2 [5, off, 5, on, 10, off] controllers.FixedRatioController(response_step=1) xpos < 0.50 [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0... bbox3 ../data/tracks/bbox3/20180219-162833-ACTEST2-s... 20180219-162833-ACTEST2 5 off 5 ... 0.0454 0.1041 0.0454 0.1041 NaN NaN NaN NaN NaN NaN
20180303-140323-ACTEST2 [5, off, 5, on, 10, off] controllers.FixedRatioController(response_step=1) xpos < 0.50 [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0... bbox3 ../data/tracks/bbox3/20180303-140323-ACTEST2-s... 20180303-140323-ACTEST2 5 off 5 ... 0.0399 0.1000 0.0399 0.1000 0.0399 0.1 NaN NaN NaN NaN
20180213-162455-ACTEST2 [5, off, 5, on, 10, off] controllers.FixedRatioController(response_step=1) xpos > 0.50 [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0... bbox3 ../data/tracks/bbox3/20180213-162455-ACTEST2-s... 20180213-162455-ACTEST2 5 off 5 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

3 rows × 118 columns

Cleaning: Dataframe Characteristics


In [11]:
print(f'Dataframe Shape:{df.shape}')
print()   

print('Column Names by DataType')
for dt in df.dtypes.unique():
    print(f'Data Type, {dt}:')
    print(*list(df.select_dtypes(include=[dt]).columns), sep = ', ')
    print()

# print('Number of Tracks with Null Data by Column:')  #fix this
# print(df[df.isnull().any(axis=1)][df.columns[df.isnull().any()]].count())
# print()


Dataframe Shape:(61, 118)

Column Names by DataType
Data Type, object:
phases, controller, trigger, stimulus, box, file, track, phase1time, phase1stim, phase2time, phase2stim, phase3time, phase3stim

Data Type, float64:
all|#Datapoints, all|#Freezes, all|#Triggers, all|#Valid, all|%Valid datapoints, all|Avg. normed x coordinate, all|Avg. speed (?/sec), all|Avg. time per freeze (sec), all|Avg. time per trigger (sec), all|Avg. x coordinate, all|Avg. x speed (?/sec), all|Avg. y coordinate, all|Avg. y speed (?/sec), all|Freeze frequency (per min), all|Total distance traveled (?), all|Total time (sec), all|Total time frozen (sec), all|Total time triggered (sec), all|Trigger frequency (per min), all|Valid time (sec), phase 1|#Datapoints, phase 1|#Freezes, phase 1|#Triggers, phase 1|#Valid, phase 1|%Valid datapoints, phase 1|Avg. normed x coordinate, phase 1|Avg. speed (?/sec), phase 1|Avg. time per freeze (sec), phase 1|Avg. time per trigger (sec), phase 1|Avg. x coordinate, phase 1|Avg. x speed (?/sec), phase 1|Avg. y coordinate, phase 1|Avg. y speed (?/sec), phase 1|Freeze frequency (per min), phase 1|Total distance traveled (?), phase 1|Total time (sec), phase 1|Total time frozen (sec), phase 1|Total time triggered (sec), phase 1|Trigger frequency (per min), phase 1|Valid time (sec), phase 2|#Datapoints, phase 2|#Freezes, phase 2|#Triggers, phase 2|#Valid, phase 2|%Valid datapoints, phase 2|Avg. normed x coordinate, phase 2|Avg. speed (?/sec), phase 2|Avg. time per freeze (sec), phase 2|Avg. time per trigger (sec), phase 2|Avg. x coordinate, phase 2|Avg. x speed (?/sec), phase 2|Avg. y coordinate, phase 2|Avg. y speed (?/sec), phase 2|Freeze frequency (per min), phase 2|Total distance traveled (?), phase 2|Total time (sec), phase 2|Total time frozen (sec), phase 2|Total time triggered (sec), phase 2|Trigger frequency (per min), phase 2|Valid time (sec), phase 3|#Datapoints, phase 3|#Freezes, phase 3|#Triggers, phase 3|#Valid, phase 3|%Valid datapoints, phase 3|Avg. normed x coordinate, phase 3|Avg. speed (?/sec), phase 3|Avg. time per freeze (sec), phase 3|Avg. time per trigger (sec), phase 3|Avg. x coordinate, phase 3|Avg. x speed (?/sec), phase 3|Avg. y coordinate, phase 3|Avg. y speed (?/sec), phase 3|Freeze frequency (per min), phase 3|Total distance traveled (?), phase 3|Total time (sec), phase 3|Total time frozen (sec), phase 3|Total time triggered (sec), phase 3|Trigger frequency (per min), phase 3|Valid time (sec), phase1-tme to 1st trigger frm phs start, phase1-tme to 1st trigger_plus1 frm phs start, phase1-tme to 1st trigger_plus2 frm phs start, phase1-tme to 1st trigger_plus3 frm phs start, phase1-tme to 1st trigger_plus4 frm phs start, phase2-tme to 1st trigger frm phs start, phase2-tme to 1st trigger frm prev trigger, phase2-tme to 1st trigger_plus1 frm phs start, phase2-tme to 1st trigger_plus1 frm prev trigger, phase2-tme to 1st trigger_plus2 frm phs start, phase2-tme to 1st trigger_plus2 frm prev trigger, phase2-tme to 1st trigger_plus3 frm phs start, phase2-tme to 1st trigger_plus3 frm prev trigger, phase2-tme to 1st trigger_plus4 frm phs start, phase2-tme to 1st trigger_plus4 frm prev trigger, phase3-tme to 1st trigger frm phs start, phase3-tme to 1st trigger frm prev trigger, phase3-tme to 1st trigger_plus1 frm phs start, phase3-tme to 1st trigger_plus1 frm prev trigger, phase3-tme to 1st trigger_plus2 frm phs start, phase3-tme to 1st trigger_plus2 frm prev trigger, phase3-tme to 1st trigger_plus3 frm phs start, phase3-tme to 1st trigger_plus3 frm prev trigger, phase3-tme to 1st trigger_plus4 frm phs start, phase3-tme to 1st trigger_plus4 frm prev trigger

Cleaning: Early Termination Check


In [12]:
print(f'''Track Times: Mean {df['all|Total time (sec)'].mean()}, Minimum {df['all|Total time (sec)'].min()}, Maximum {df['all|Total time (sec)'].max()}, Count {df['all|Total time (sec)'].count()}''')

fig, ax = plt.subplots(1, 1, figsize=(6, 6))
ax.ticklabel_format(useOffset=False) # prevents appearance of scientific notation on y axis
df.boxplot(column='all|Total time (sec)', by='box', ax=ax)


Track Times: Mean 1160.2349475409835, Minimum 15.408500000000002, Maximum 1199.0315, Count 61
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x2ac36bff0c18>

Cleaning: Poor Tracking Check


In [13]:
print(f'''Valid Datapoints: Mean {df['all|%Valid datapoints'].mean()}, Minimum {df['all|%Valid datapoints'].min()}, Maximum {df['all|%Valid datapoints'].max()}, Count {df['all|%Valid datapoints'].count()}''')

fig, ax = plt.subplots(1, 1, figsize=(6, 6))
df.boxplot(column='all|%Valid datapoints', by='box', ax=ax)


Valid Datapoints: Mean 0.9755440048521484, Minimum 0.19087725150100066, Maximum 1.0, Count 61
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x2ac36e25d940>

Cleaning: No Trigger Check


In [14]:
print(f'''Number of Triggers: Mean {df['phase 2|#Triggers'].mean()}, Minimum {df['all|#Triggers'].min()}, Maximum {df['all|#Triggers'].max()}, Count {df['all|#Triggers'].count()}''')

fig, ax = plt.subplots(1, 1, figsize=(6, 6))
df.boxplot(column='phase 2|#Triggers', by='box', ax=ax)


Number of Triggers: Mean 10.310344827586206, Minimum 1.0, Maximum 192.0, Count 61
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x2ac36e30c9e8>

Cleaning: Removing Tracks for Early Termination, Poor Tracking, No Trigger


In [15]:
print(f'Raw Track Number: {df.shape[0]}')

df = df.drop(df[df['all|Total time (sec)'] < (df['all|Total time (sec)'].mean())* .75].index) # drops rows if any data is missing, this will remove early termination tracks
print(f'Modified Track Number: {df.shape[0]} (following removal of tracks less than 75% the length of the experiment mean)')

df = df.drop(df[df['all|%Valid datapoints'] < acquisitionlevel].index)
print(f'Modified Track Number: {df.shape[0]} (following removal for poor tracking set at less than {acquisitionlevel}% valid datapoints)')

if notriggerexclude == True:
    df = df.drop(df[df['phase 2|#Triggers'] == 0].index) # drops rows if there was no trigger during phase 2; NOTE: fix this so it works if learning phase is not 2
    print(f'Modified Track Number: {df.shape[0]} (following removal of tracks with no triggers during the learning)')


Raw Track Number: 61
Modified Track Number: 59 (following removal of tracks less than 75% the length of the experiment mean)
Modified Track Number: 56 (following removal for poor tracking set at less than 0.85% valid datapoints)
Modified Track Number: 52 (following removal of tracks with no triggers during the learning)

Cleaning: Checking Randomization of Trigger Condition


In [16]:
dftrig = df.groupby('box')['trigger'].describe()
dftrig


Out[16]:
count unique top freq
box
bbox1 15 2 xpos > 0.50 8
bbox2 13 2 xpos < 0.50 7
bbox3 8 2 xpos > 0.50 5
bbox4 6 1 xpos < 0.50 6
bbox5 10 2 xpos > 0.50 7

In [17]:
boxlist = df.box.unique().tolist() #creates a list of all boxes in the experiment

onetriglist = dftrig.index[dftrig.unique < 2].tolist() # creates a list of boxes with less than 2 trigger conditions

boxlist  = [x for x in boxlist if x not in onetriglist] # removes boxes with less than 2 trigger conditions

if len(onetriglist) > 0:
    print(f'WARNING: The following boxes had only one trigger condition: {onetriglist}. These boxes removed from trigger analyses below.')

else:
    pass


WARNING: The following boxes had only one trigger condition: ['bbox4']. These boxes removed from trigger analyses below.

In [18]:
print(f'Trigger Conditions: {df.trigger.unique()}')
print()

from scipy.stats import ttest_ind

# performs welch's t-test (does not assume equal variances) on all floats and prints any that are signficantly different as a function of trigger
for i in df.select_dtypes(include=['float64']).columns:
    for b in boxlist:
        dfbox = df[df.box == b]
        ttest_result = ttest_ind(dfbox[dfbox.trigger == dfbox.trigger.unique()[0]][i], dfbox[dfbox.trigger == dfbox.trigger.unique()[1]][i], equal_var=False, nan_policy='omit')

        if ttest_result.pvalue < (.05/len(df.select_dtypes(include=['float64']).columns)):
            print(i)
            print(f'   {b}: Welchs T-Test indicates significant difference by trigger condition, p = {ttest_result.pvalue}')
            print(f'      Trigger Condition 1 Mean: {dfbox[dfbox.trigger == dfbox.trigger.unique()[0]][i].mean()}')
            print(f'      Trigger Condition 2 Mean: {dfbox[dfbox.trigger == dfbox.trigger.unique()[1]][i].mean()}')
            print()


Trigger Conditions: ['xpos < 0.50' 'xpos > 0.50']

all|Avg. x coordinate
   bbox1: Welchs T-Test indicates significant difference by trigger condition, p = 0.00024747642545408353
      Trigger Condition 1 Mean: 0.43569297360059644
      Trigger Condition 2 Mean: 0.6920929014853393

all|Avg. x coordinate
   bbox2: Welchs T-Test indicates significant difference by trigger condition, p = 0.0002029443327951502
      Trigger Condition 1 Mean: 0.64120703783913
      Trigger Condition 2 Mean: 0.3919377237266763

phase 2|Avg. x coordinate
   bbox5: Welchs T-Test indicates significant difference by trigger condition, p = 4.170130087876906e-05
      Trigger Condition 1 Mean: 0.7293638370210188
      Trigger Condition 2 Mean: 0.2752628949147006

phase 2|Avg. x coordinate
   bbox1: Welchs T-Test indicates significant difference by trigger condition, p = 1.4770805728989706e-08
      Trigger Condition 1 Mean: 0.2904322437457625
      Trigger Condition 2 Mean: 0.7896514808336045

phase 2|Avg. x coordinate
   bbox2: Welchs T-Test indicates significant difference by trigger condition, p = 1.8160212017251937e-08
      Trigger Condition 1 Mean: 0.7455221202831153
      Trigger Condition 2 Mean: 0.26757720851561123

Cleaning: Checking for Box Variations

Conducts one-way ANOVAs using box as an independent variable and all floats as dependent variables. Uses a Bonferroni correction.


In [31]:
def betweensubjectANOVA (dependentvar, betweenfactor, suppress):
    try: 
        anovaresult = pp.anova(dv=dependentvar, between=betweenfactor, data=df, detailed=True, export_filename=None)
        pvalue = anovaresult.loc[anovaresult.Source==betweenfactor]['p-unc'].values[0]
        
        if pvalue >= .05/len(df.select_dtypes(include=['float64']).columns):
            if suppress == False:
                print(f'{dependentvar}')
                print(f'   NOT significant: One-way ANOVA conducted testing {betweenfactor} as significant predictor of {dependentvar}. P = {pvalue}')
                print()
            else:
                pass
        else:
            print(f'{dependentvar}')
            print(f'   SIGNIFICANT: One-way ANOVA conducted testing {betweenfactor} as significant predictor of {dependentvar}. P = {pvalue}')
            fig, ax = plt.subplots(1, 1, figsize=(6, 6))
            df.boxplot(column=dependentvar, by=betweenfactor, ax=ax)
            print()
            
    except:
        print(f'{dependentvar} analysis failed. Check descriptives.')

for col in df.select_dtypes(include=['float64']).columns:
    betweensubjectANOVA(col,'box', True)


phase3-tme to 1st trigger frm phs start
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger frm phs start. P = 3.806275018506156e-27

phase3-tme to 1st trigger_plus1 frm phs start
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger_plus1 frm phs start. P = 5.830196408402104e-27

phase3-tme to 1st trigger_plus2 frm phs start
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger_plus2 frm phs start. P = 7.313705616602799e-37

/home/bsheese/atles/notebooks/pingouinparametrics.py:899: RuntimeWarning: divide by zero encountered in double_scalars
  fval = msbetween / mserror
phase3-tme to 1st trigger_plus3 frm phs start
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger_plus3 frm phs start. P = 0.0

phase3-tme to 1st trigger_plus3 frm prev trigger
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger_plus3 frm prev trigger. P = 0.0

phase3-tme to 1st trigger_plus4 frm phs start analysis failed. Check descriptives.
/home/bsheese/atles/notebooks/pingouinparametrics.py:899: RuntimeWarning: invalid value encountered in double_scalars
  fval = msbetween / mserror
/home/bsheese/atles/notebooks/pingouinparametrics.py:904: RuntimeWarning: invalid value encountered in double_scalars
  np2 = ssbetween / (ssbetween + sserror)
phase3-tme to 1st trigger_plus4 frm prev trigger analysis failed. Check descriptives.

Analysis - Preliminary Vizualizations - X and Y by Box


In [33]:
fig, ax = plt.subplots(1, 3, figsize=(15, 6), sharey=True)
 
df.boxplot(column=['phase 1|Avg. normed x coordinate', 'phase 2|Avg. normed x coordinate', 'phase 3|Avg. normed x coordinate'], by='box', ax=ax)


/share/apps/jhub/lib64/python3.6/site-packages/pandas/plotting/_core.py:2257: UserWarning: When passing multiple axes, sharex and sharey are ignored. These settings must be specified when creating axes
  return_type=return_type, **kwds)
Out[33]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x2ac36f2c4a90>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x2ac36ebc8438>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x2ac36e88c470>],
      dtype=object)

In [21]:
fig, ax = plt.subplots(1, 3, figsize=(15, 6), sharey=True)
 
df.boxplot(column=['phase 1|Avg. y coordinate', 'phase 2|Avg. y coordinate', 'phase 3|Avg. y coordinate'], by='box', ax=ax)


/share/apps/jhub/lib64/python3.6/site-packages/pandas/plotting/_core.py:2257: UserWarning: When passing multiple axes, sharex and sharey are ignored. These settings must be specified when creating axes
  return_type=return_type, **kwds)
Out[21]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x2ac36eff95c0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x2ac36eddb4a8>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x2ac36ee029b0>],
      dtype=object)

Analysis - Preliminary Vizualizations - X and Y by Phase


In [22]:
fig, ax = plt.subplots(1, 1, figsize=(15, 6)) 
df.boxplot(column=['phase 1|Avg. normed x coordinate', 'phase 2|Avg. normed x coordinate', 'phase 3|Avg. normed x coordinate'], ax=ax)


Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x2ac36f19b048>

In [34]:
fig, ax = plt.subplots(1, 1, figsize=(15, 6)) 
df.boxplot(column=['phase 1|Avg. y coordinate', 'phase 2|Avg. y coordinate', 'phase 3|Avg. y coordinate'], ax=ax)


Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x2ac36e3fe978>

Analysis - Preliminary Vizualizations - Heatmaps Per Phase


In [26]:
plotter = TrackPlotter(processor)
plotter.plot_heatmap(plot_type='per-phase')



In [28]:
# 'phase 1|Avg. normed x coordinate', 'phase 2|Avg. normed x coordinate', 'phase 3|Avg. normed x coordinate'





# aov = rm_anova(dv='DV', within='Time', data=df, correction='auto', remove_na=True, detailed=True, export_filename=None)
# print_table(aov)

In [29]:
phasenumcount = 1 
dependentvar = 'Avg. normed x coordinate'
dfanova = pd.DataFrame()

while phasenumcount <= phasenum:
    colname = f'phase {str(phasenumcount)}|{dependentvar}'
    dftemp = df[[colname]].copy()
    dftemp.columns.values[0] = dependentvar
    dftemp['phase'] = phasenumcount
    dfanova = dfanova.append(dftemp)
    
    phasenumcount +=1
   
pp.rm_anova(dv='Avg. normed x coordinate', within='phase', data=dfanova, correction='auto', remove_na=True, detailed=False, export_filename=None)


Out[29]:
Source ddof1 ddof2 F p-unc np2
0 phase 2 102 73.964247 1.412552e-20 0.591883