ATLeS - Descriptive Statistics

This script is designed to provide a general purpose tool for producing descriptive statistics and visualizations for ATLES data. The intent is that this notebook will provide a basic framework for you to build on.

Instructions

Provide experiment details in the 'Parameters' section below, then execute notebook to generate stats.

General Information

Everytime an experiment is run ATLeS generates three files.

date-time-experimentname.txt (log of tracking activity/issues)
date-time-experimentname-setup.txt (details of experimental setup)
date-time-experimentname-track.csv (track files; raw tracking data)

Broadly this notebook will:

grab the relevant data sources (see above) and integrate them
clean up the data a bit
summarize the data a bit
vizualize the data a bit

To do:

Function to check for duplicates, remove empty rows from df

Import Libraries



In [1]:

    
from pathlib import Path
import configparser
import numpy as np
import pandas as pd
import seaborn
import matplotlib.pyplot as plt
import pingouinparametrics as pp

# add src/ directory to path to import ATLeS code
import os
import sys
module_path = os.path.abspath(os.path.join('..', 'src'))
if module_path not in sys.path:
    sys.path.append(module_path)

# imported from ATLeS
from analysis.process import TrackProcessor
from analysis.plot import TrackPlotter

# displays plots in notebook output
%matplotlib inline

Parameters

Input experiment details here:



In [2]:

    
experimentname = 'ACTEST2'
trackdirectory = '../data/tracks'
experimenttype = 'extinction' # Set to 'extinction' or 'none'. Supplemental analyses are generated for extinction experiments.

Set analysis options here:



In [3]:

    
acquisitionlevel = .85  # Sets cut off level for excluding tracks based on poor tracking.
notriggerexclude = True  # If True, excludes tracks where the trigger was never triggered. If False, includes tracks where no trigger occurred

Globals



In [4]:

    
framelist = [] # Collects frames generated for eventual combination

Identify the Data Files

Finds track and settingsfiles within the trackdirectory that match the experiment names and creates lists of track and settings files.



In [5]:

    
trackfiles = list(Path(trackdirectory).glob(f'**/*{experimentname}*track.csv'))
settingsfiles = list(Path(trackdirectory).glob(f'**/*{experimentname}*setup.txt'))
    
print(f'{len(trackfiles)} track files were found with the name {experimentname}')
print(f'{len(settingsfiles)} settings files were found with the name {experimentname}\n')

if len(trackfiles) != len(settingsfiles):
    print('WARNING: Mismatched track and settings files.')









    



61 track files were found with the name ACTEST2
61 settings files were found with the name ACTEST2

Identify and Store Experimental Settings

The number of experimental phases varies across experiments. This block identifies the phases used for the current experiment and verfies that all tracks have the same phase information.

The settings may vary between tracks within an experiment. This block also identifies the settings for each track and writes them to a dictionary.



In [6]:

    
Config = configparser.ConfigParser()

settingsdic ={} # Dictionary used to store all settings information.
phaselist = [] # List of phases used to verify phases are consistent across tracks.


# reads and organizes information from each settings file
for file in settingsfiles:
    Config.read(file)
    
    # generate clean list of stimuli
    stiminfo = Config.get('experiment', 'stimulus') #gets stim info
    stiminfo = stiminfo.replace('(', ',').replace(')', '').replace(' ', '').split(',')[1:] #cleans stim list
    
    # generate clean list of phases
    phaselisttemp = Config.get('phases', 'phases_argstrings') # gets phase info
    phaselisttemp = phaselisttemp.replace('-p ', '').replace(' ', '').split(',')[:-1] #cleans phase list

    # compare each phase list with the list from the previous settings file
    if len(phaselist) == 0:
        phaselist = phaselisttemp
    elif phaselist != phaselisttemp:
        print('Warning: Inconsistent phases between settings files.')
    else:
        pass

    # counts phases and generates phase variable names
    phasenumber = len(phaselist)//2
    phasenames = []
    for i in range(phasenumber):
        p, t, s = 'phase', 'time', 'stim'
        phase = p+str(i+1)
        phasetime = phase + t
        phasestim = phase + s
        phasenames.extend((phasetime, phasestim))

    # gets settings info from filename (track/box)
    trackname = file.parts[-1].replace("-setup.txt", "")
    box = file.parts[-2]

    # gets settings info from setting file
    controller = Config.get('experiment', 'controller')
    trigger = Config.get('experiment', 'trigger')

    settings = [phaselisttemp, controller, trigger, stiminfo, box, str(file)]
    
    # puts all settings in dic keyed to trackname
    settingsdic[trackname] = settings

# creates settings dataframe from settingsdic
dfsettings = pd.DataFrame(settingsdic).transpose()
dfsettings.columns = ['phases', 'controller', 'trigger', 'stimulus', 'box', 'file']
dfsettings['track'] = dfsettings.index

# creates stimulus dataframe, splits up and names stims
dfstim = pd.DataFrame(dfsettings.stimulus.values.tolist(), index=dfsettings.index).fillna('-')

for col in range(dfstim.shape[1]):
    dfstim=dfstim.rename(columns = {col:('stim_setting' + str(col))})

framelist.append(dfsettings)
dfsettings.head(3)









    Out[6]:







  
    
      
      phases
      controller
      trigger
      stimulus
      box
      file
      track
    
  
  
    
      20180219-162833-ACTEST2
      [5, off, 5, on, 10, off]
      controllers.FixedRatioController(response_step=1)
      xpos < 0.50
      [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0...
      bbox3
      ../data/tracks/bbox3/20180219-162833-ACTEST2-s...
      20180219-162833-ACTEST2
    
    
      20180303-140323-ACTEST2
      [5, off, 5, on, 10, off]
      controllers.FixedRatioController(response_step=1)
      xpos < 0.50
      [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0...
      bbox3
      ../data/tracks/bbox3/20180303-140323-ACTEST2-s...
      20180303-140323-ACTEST2
    
    
      20180213-162455-ACTEST2
      [5, off, 5, on, 10, off]
      controllers.FixedRatioController(response_step=1)
      xpos > 0.50
      [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0...
      bbox3
      ../data/tracks/bbox3/20180213-162455-ACTEST2-s...
      20180213-162455-ACTEST2

Identify Phasetimes and Create Phase Dataframe

This block extracts phase info from settings w. trackname and calculates phasetimes.

This code currently assummes all phase time are the same across tracks within the experiment. This will need to be rewritten if we want to start running analyses across multiple studies with different phase times.



In [7]:

    
phaseinfo = settingsdic.get(trackname)[0]
phaseinfo = [x for x in phaseinfo if any(c.isdigit() for c in x)]
phaseinfo = list(map(int, phaseinfo))
phaseinfo = [i * 60 for i in phaseinfo]
phaselen = len(phaseinfo)

phaset = []
for i in range(phaselen):
    times = sum(phaseinfo[0:i+1])
    phaset.append(times)

# moves 0 to the first entry of phaset (works, but find a cleaner way to do this)
a = 0
phaset[0:0] = [a]

phasedic = {}
for i in range(phaselen):
    phasedic[i+1] = [phaset[i], phaset[i+1]]


# splits up and names the phases
dfphase = pd.DataFrame(dfsettings.phases.values.tolist(), index=dfsettings.index).fillna('-')
dfphase.columns = phasenames

phasenum = len(dfphase.columns)//2

framelist.append(dfphase)

dfphase.head(3)









    Out[7]:







  
    
      
      phase1time
      phase1stim
      phase2time
      phase2stim
      phase3time
      phase3stim
    
  
  
    
      20180219-162833-ACTEST2
      5
      off
      5
      on
      10
      off
    
    
      20180303-140323-ACTEST2
      5
      off
      5
      on
      10
      off
    
    
      20180213-162455-ACTEST2
      5
      off
      5
      on
      10
      off

Generate Basic Stats



In [8]:

    
dfstats = pd.DataFrame()

for track in trackfiles:
    # gets track from file name
    trackname = track.parts[-1].replace("-track.csv", "")
    
    # gets stats from TrackProcessor (ATLeS analysis class)
    processor = TrackProcessor(str(track), normalize_x_with_trigger='xpos < 0.50')
    tempstatsdic = processor.get_stats(include_phases=True) # gets stats from track object
    
    # flattens dictionary into dataframe, from https://stackoverflow.com/questions/13575090/
    dftemp = pd.DataFrame.from_dict({(i,j): tempstatsdic[i][j] for i in tempstatsdic.keys() for j in tempstatsdic[i].keys()}, orient='index')
    
    #transposes dataframe and adds track as index
    dftemp = dftemp.transpose()
    dftemp['track'] = trackname 
    dftemp.set_index('track', inplace=True)
    
    dfstats = dfstats.append(dftemp, sort=True)

if 'phase 0' in dfstats.columns:
    dfstats.rename({'phase 0': 'p1', 'phase 1': 'p2', 'phase 2': 'p3'}, axis='columns', inplace = True)

dfstats.columns = dfstats.columns.map('|'.join)
    
framelist.append(dfstats)    
    
dfstats.head(3)









    Out[8]:







  
    
      
      all|#Datapoints
      all|#Freezes
      all|#Triggers
      all|#Valid
      all|%Valid datapoints
      all|Avg. normed x coordinate
      all|Avg. speed (?/sec)
      all|Avg. time per freeze (sec)
      all|Avg. time per trigger (sec)
      all|Avg. x coordinate
      ...
      phase 3|Avg. x speed (?/sec)
      phase 3|Avg. y coordinate
      phase 3|Avg. y speed (?/sec)
      phase 3|Freeze frequency (per min)
      phase 3|Total distance traveled (?)
      phase 3|Total time (sec)
      phase 3|Total time frozen (sec)
      phase 3|Total time triggered (sec)
      phase 3|Trigger frequency (per min)
      phase 3|Valid time (sec)
    
    
      track
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      20180303-140323-ACTEST2
      11989.0
      4.0
      162.0
      11978.0
      0.999082
      0.453400
      0.211504
      39.397750
      2.606121
      0.546600
      ...
      0.113900
      0.240247
      0.053649
      0.200015
      81.223056
      599.9547
      146.8919
      202.1632
      5.600423
      599.6567
    
    
      20180219-162833-ACTEST2
      11992.0
      39.0
      77.0
      11974.0
      0.998499
      0.336595
      0.090913
      5.821121
      2.409019
      0.663405
      ...
      0.048981
      0.137688
      0.023124
      2.400172
      35.668226
      599.9571
      119.4581
      90.5968
      1.600114
      599.7578
    
    
      20180126-162646-ACTEST2
      11992.0
      13.0
      68.0
      11982.0
      0.999166
      0.356098
      0.057592
      25.261700
      6.607431
      0.356098
      ...
      0.037321
      0.451903
      0.029200
      0.900067
      31.472323
      599.9556
      49.3343
      300.1005
      2.800207
      599.9556
    
  

3 rows × 80 columns

Generate Extinction Stats



In [9]:

    
if experimenttype == 'extinction':

    dfextstats = pd.DataFrame()

    for track in trackfiles:
        # gets track from file name
        trackname = track.parts[-1].replace("-track.csv", "")

        # gets advances stats from TrackProcessor (ATLeS analysis class)
        processor = TrackProcessor(str(track)) # passes track to track processor and returns track object
        tempstatsdic = processor.get_exp_stats('extinction') # gets stats from track object

        dftemp3 = pd.DataFrame(tempstatsdic, index=[0])

        dftemp3['track'] = trackname 
        dftemp3.set_index('track', inplace=True)

        dfextstats = dfextstats.append(dftemp3, sort=True)
    
    framelist.append(dfextstats)
    

else:
    print('Extinction experiment not selected in Parameters section.')


dfextstats.head(3)









    Out[9]:







  
    
      
      phase1-tme to 1st trigger frm phs start
      phase1-tme to 1st trigger_plus1 frm phs start
      phase1-tme to 1st trigger_plus2 frm phs start
      phase1-tme to 1st trigger_plus3 frm phs start
      phase1-tme to 1st trigger_plus4 frm phs start
      phase2-tme to 1st trigger frm phs start
      phase2-tme to 1st trigger frm prev trigger
      phase2-tme to 1st trigger_plus1 frm phs start
      phase2-tme to 1st trigger_plus1 frm prev trigger
      phase2-tme to 1st trigger_plus2 frm phs start
      ...
      phase3-tme to 1st trigger frm phs start
      phase3-tme to 1st trigger frm prev trigger
      phase3-tme to 1st trigger_plus1 frm phs start
      phase3-tme to 1st trigger_plus1 frm prev trigger
      phase3-tme to 1st trigger_plus2 frm phs start
      phase3-tme to 1st trigger_plus2 frm prev trigger
      phase3-tme to 1st trigger_plus3 frm phs start
      phase3-tme to 1st trigger_plus3 frm prev trigger
      phase3-tme to 1st trigger_plus4 frm phs start
      phase3-tme to 1st trigger_plus4 frm prev trigger
    
    
      track
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      20180303-140323-ACTEST2
      0.0048
      0.0048
      1.2063
      1.2063
      10.1036
      206.9687
      223.7824
      207.3683
      224.1820
      207.8689
      ...
      0.0399
      0.1000
      0.0399
      0.1000
      0.0399
      0.1
      NaN
      NaN
      NaN
      NaN
    
    
      20180219-162833-ACTEST2
      0.0072
      0.0072
      1.6063
      1.7044
      1.8050
      84.3770
      370.1737
      84.5793
      370.3760
      86.5792
      ...
      0.0454
      0.1041
      0.0454
      0.1041
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      20180126-162646-ACTEST2
      0.0042
      0.0042
      0.7058
      0.7058
      0.8039
      148.4718
      438.3684
      149.0718
      438.9684
      149.5718
      ...
      0.0393
      0.0999
      0.0393
      0.0999
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

3 rows × 25 columns

Combine Dataframes

Combines settings, stim, phase, and with dataframe of basic descriptive stats.



In [10]:

    
df = pd.concat(framelist, axis=1, sort=False) # combines all frames
df.dropna(axis=0, how='all', inplace=True) # drops any rows where all values are missing
df.head(3)









    Out[10]:







  
    
      
      phases
      controller
      trigger
      stimulus
      box
      file
      track
      phase1time
      phase1stim
      phase2time
      ...
      phase3-tme to 1st trigger frm phs start
      phase3-tme to 1st trigger frm prev trigger
      phase3-tme to 1st trigger_plus1 frm phs start
      phase3-tme to 1st trigger_plus1 frm prev trigger
      phase3-tme to 1st trigger_plus2 frm phs start
      phase3-tme to 1st trigger_plus2 frm prev trigger
      phase3-tme to 1st trigger_plus3 frm phs start
      phase3-tme to 1st trigger_plus3 frm prev trigger
      phase3-tme to 1st trigger_plus4 frm phs start
      phase3-tme to 1st trigger_plus4 frm prev trigger
    
  
  
    
      20180219-162833-ACTEST2
      [5, off, 5, on, 10, off]
      controllers.FixedRatioController(response_step=1)
      xpos < 0.50
      [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0...
      bbox3
      ../data/tracks/bbox3/20180219-162833-ACTEST2-s...
      20180219-162833-ACTEST2
      5
      off
      5
      ...
      0.0454
      0.1041
      0.0454
      0.1041
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      20180303-140323-ACTEST2
      [5, off, 5, on, 10, off]
      controllers.FixedRatioController(response_step=1)
      xpos < 0.50
      [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0...
      bbox3
      ../data/tracks/bbox3/20180303-140323-ACTEST2-s...
      20180303-140323-ACTEST2
      5
      off
      5
      ...
      0.0399
      0.1000
      0.0399
      0.1000
      0.0399
      0.1
      NaN
      NaN
      NaN
      NaN
    
    
      20180213-162455-ACTEST2
      [5, off, 5, on, 10, off]
      controllers.FixedRatioController(response_step=1)
      xpos > 0.50
      [ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0...
      bbox3
      ../data/tracks/bbox3/20180213-162455-ACTEST2-s...
      20180213-162455-ACTEST2
      5
      off
      5
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

3 rows × 118 columns

Cleaning: Dataframe Characteristics



In [11]:

    
print(f'Dataframe Shape:{df.shape}')
print()   

print('Column Names by DataType')
for dt in df.dtypes.unique():
    print(f'Data Type, {dt}:')
    print(*list(df.select_dtypes(include=[dt]).columns), sep = ', ')
    print()

# print('Number of Tracks with Null Data by Column:')  #fix this
# print(df[df.isnull().any(axis=1)][df.columns[df.isnull().any()]].count())
# print()









    



Dataframe Shape:(61, 118)

Column Names by DataType
Data Type, object:
phases, controller, trigger, stimulus, box, file, track, phase1time, phase1stim, phase2time, phase2stim, phase3time, phase3stim

Data Type, float64:
all|#Datapoints, all|#Freezes, all|#Triggers, all|#Valid, all|%Valid datapoints, all|Avg. normed x coordinate, all|Avg. speed (?/sec), all|Avg. time per freeze (sec), all|Avg. time per trigger (sec), all|Avg. x coordinate, all|Avg. x speed (?/sec), all|Avg. y coordinate, all|Avg. y speed (?/sec), all|Freeze frequency (per min), all|Total distance traveled (?), all|Total time (sec), all|Total time frozen (sec), all|Total time triggered (sec), all|Trigger frequency (per min), all|Valid time (sec), phase 1|#Datapoints, phase 1|#Freezes, phase 1|#Triggers, phase 1|#Valid, phase 1|%Valid datapoints, phase 1|Avg. normed x coordinate, phase 1|Avg. speed (?/sec), phase 1|Avg. time per freeze (sec), phase 1|Avg. time per trigger (sec), phase 1|Avg. x coordinate, phase 1|Avg. x speed (?/sec), phase 1|Avg. y coordinate, phase 1|Avg. y speed (?/sec), phase 1|Freeze frequency (per min), phase 1|Total distance traveled (?), phase 1|Total time (sec), phase 1|Total time frozen (sec), phase 1|Total time triggered (sec), phase 1|Trigger frequency (per min), phase 1|Valid time (sec), phase 2|#Datapoints, phase 2|#Freezes, phase 2|#Triggers, phase 2|#Valid, phase 2|%Valid datapoints, phase 2|Avg. normed x coordinate, phase 2|Avg. speed (?/sec), phase 2|Avg. time per freeze (sec), phase 2|Avg. time per trigger (sec), phase 2|Avg. x coordinate, phase 2|Avg. x speed (?/sec), phase 2|Avg. y coordinate, phase 2|Avg. y speed (?/sec), phase 2|Freeze frequency (per min), phase 2|Total distance traveled (?), phase 2|Total time (sec), phase 2|Total time frozen (sec), phase 2|Total time triggered (sec), phase 2|Trigger frequency (per min), phase 2|Valid time (sec), phase 3|#Datapoints, phase 3|#Freezes, phase 3|#Triggers, phase 3|#Valid, phase 3|%Valid datapoints, phase 3|Avg. normed x coordinate, phase 3|Avg. speed (?/sec), phase 3|Avg. time per freeze (sec), phase 3|Avg. time per trigger (sec), phase 3|Avg. x coordinate, phase 3|Avg. x speed (?/sec), phase 3|Avg. y coordinate, phase 3|Avg. y speed (?/sec), phase 3|Freeze frequency (per min), phase 3|Total distance traveled (?), phase 3|Total time (sec), phase 3|Total time frozen (sec), phase 3|Total time triggered (sec), phase 3|Trigger frequency (per min), phase 3|Valid time (sec), phase1-tme to 1st trigger frm phs start, phase1-tme to 1st trigger_plus1 frm phs start, phase1-tme to 1st trigger_plus2 frm phs start, phase1-tme to 1st trigger_plus3 frm phs start, phase1-tme to 1st trigger_plus4 frm phs start, phase2-tme to 1st trigger frm phs start, phase2-tme to 1st trigger frm prev trigger, phase2-tme to 1st trigger_plus1 frm phs start, phase2-tme to 1st trigger_plus1 frm prev trigger, phase2-tme to 1st trigger_plus2 frm phs start, phase2-tme to 1st trigger_plus2 frm prev trigger, phase2-tme to 1st trigger_plus3 frm phs start, phase2-tme to 1st trigger_plus3 frm prev trigger, phase2-tme to 1st trigger_plus4 frm phs start, phase2-tme to 1st trigger_plus4 frm prev trigger, phase3-tme to 1st trigger frm phs start, phase3-tme to 1st trigger frm prev trigger, phase3-tme to 1st trigger_plus1 frm phs start, phase3-tme to 1st trigger_plus1 frm prev trigger, phase3-tme to 1st trigger_plus2 frm phs start, phase3-tme to 1st trigger_plus2 frm prev trigger, phase3-tme to 1st trigger_plus3 frm phs start, phase3-tme to 1st trigger_plus3 frm prev trigger, phase3-tme to 1st trigger_plus4 frm phs start, phase3-tme to 1st trigger_plus4 frm prev trigger

Cleaning: Early Termination Check



In [12]:

    
print(f'''Track Times: Mean {df['all|Total time (sec)'].mean()}, Minimum {df['all|Total time (sec)'].min()}, Maximum {df['all|Total time (sec)'].max()}, Count {df['all|Total time (sec)'].count()}''')

fig, ax = plt.subplots(1, 1, figsize=(6, 6))
ax.ticklabel_format(useOffset=False) # prevents appearance of scientific notation on y axis
df.boxplot(column='all|Total time (sec)', by='box', ax=ax)









    



Track Times: Mean 1160.2349475409835, Minimum 15.408500000000002, Maximum 1199.0315, Count 61






    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x2ac36bff0c18>

Cleaning: Poor Tracking Check



In [13]:

    
print(f'''Valid Datapoints: Mean {df['all|%Valid datapoints'].mean()}, Minimum {df['all|%Valid datapoints'].min()}, Maximum {df['all|%Valid datapoints'].max()}, Count {df['all|%Valid datapoints'].count()}''')

fig, ax = plt.subplots(1, 1, figsize=(6, 6))
df.boxplot(column='all|%Valid datapoints', by='box', ax=ax)









    



Valid Datapoints: Mean 0.9755440048521484, Minimum 0.19087725150100066, Maximum 1.0, Count 61






    Out[13]:





<matplotlib.axes._subplots.AxesSubplot at 0x2ac36e25d940>

Cleaning: No Trigger Check



In [14]:

    
print(f'''Number of Triggers: Mean {df['phase 2|#Triggers'].mean()}, Minimum {df['all|#Triggers'].min()}, Maximum {df['all|#Triggers'].max()}, Count {df['all|#Triggers'].count()}''')

fig, ax = plt.subplots(1, 1, figsize=(6, 6))
df.boxplot(column='phase 2|#Triggers', by='box', ax=ax)









    



Number of Triggers: Mean 10.310344827586206, Minimum 1.0, Maximum 192.0, Count 61






    Out[14]:





<matplotlib.axes._subplots.AxesSubplot at 0x2ac36e30c9e8>

Cleaning: Removing Tracks for Early Termination, Poor Tracking, No Trigger



In [15]:

    
print(f'Raw Track Number: {df.shape[0]}')

df = df.drop(df[df['all|Total time (sec)'] < (df['all|Total time (sec)'].mean())* .75].index) # drops rows if any data is missing, this will remove early termination tracks
print(f'Modified Track Number: {df.shape[0]} (following removal of tracks less than 75% the length of the experiment mean)')

df = df.drop(df[df['all|%Valid datapoints'] < acquisitionlevel].index)
print(f'Modified Track Number: {df.shape[0]} (following removal for poor tracking set at less than {acquisitionlevel}% valid datapoints)')

if notriggerexclude == True:
    df = df.drop(df[df['phase 2|#Triggers'] == 0].index) # drops rows if there was no trigger during phase 2; NOTE: fix this so it works if learning phase is not 2
    print(f'Modified Track Number: {df.shape[0]} (following removal of tracks with no triggers during the learning)')









    



Raw Track Number: 61
Modified Track Number: 59 (following removal of tracks less than 75% the length of the experiment mean)
Modified Track Number: 56 (following removal for poor tracking set at less than 0.85% valid datapoints)
Modified Track Number: 52 (following removal of tracks with no triggers during the learning)

Cleaning: Checking Randomization of Trigger Condition



In [16]:

    
dftrig = df.groupby('box')['trigger'].describe()
dftrig









    Out[16]:







  
    
      
      count
      unique
      top
      freq
    
    
      box
      
      
      
      
    
  
  
    
      bbox1
      15
      2
      xpos > 0.50
      8
    
    
      bbox2
      13
      2
      xpos < 0.50
      7
    
    
      bbox3
      8
      2
      xpos > 0.50
      5
    
    
      bbox4
      6
      1
      xpos < 0.50
      6
    
    
      bbox5
      10
      2
      xpos > 0.50
      7



In [17]:

    
boxlist = df.box.unique().tolist() #creates a list of all boxes in the experiment

onetriglist = dftrig.index[dftrig.unique < 2].tolist() # creates a list of boxes with less than 2 trigger conditions

boxlist  = [x for x in boxlist if x not in onetriglist] # removes boxes with less than 2 trigger conditions

if len(onetriglist) > 0:
    print(f'WARNING: The following boxes had only one trigger condition: {onetriglist}. These boxes removed from trigger analyses below.')

else:
    pass









    



WARNING: The following boxes had only one trigger condition: ['bbox4']. These boxes removed from trigger analyses below.



In [18]:

    
print(f'Trigger Conditions: {df.trigger.unique()}')
print()

from scipy.stats import ttest_ind

# performs welch's t-test (does not assume equal variances) on all floats and prints any that are signficantly different as a function of trigger
for i in df.select_dtypes(include=['float64']).columns:
    for b in boxlist:
        dfbox = df[df.box == b]
        ttest_result = ttest_ind(dfbox[dfbox.trigger == dfbox.trigger.unique()[0]][i], dfbox[dfbox.trigger == dfbox.trigger.unique()[1]][i], equal_var=False, nan_policy='omit')

        if ttest_result.pvalue < (.05/len(df.select_dtypes(include=['float64']).columns)):
            print(i)
            print(f'   {b}: Welchs T-Test indicates significant difference by trigger condition, p = {ttest_result.pvalue}')
            print(f'      Trigger Condition 1 Mean: {dfbox[dfbox.trigger == dfbox.trigger.unique()[0]][i].mean()}')
            print(f'      Trigger Condition 2 Mean: {dfbox[dfbox.trigger == dfbox.trigger.unique()[1]][i].mean()}')
            print()









    



Trigger Conditions: ['xpos < 0.50' 'xpos > 0.50']

all|Avg. x coordinate
   bbox1: Welchs T-Test indicates significant difference by trigger condition, p = 0.00024747642545408353
      Trigger Condition 1 Mean: 0.43569297360059644
      Trigger Condition 2 Mean: 0.6920929014853393

all|Avg. x coordinate
   bbox2: Welchs T-Test indicates significant difference by trigger condition, p = 0.0002029443327951502
      Trigger Condition 1 Mean: 0.64120703783913
      Trigger Condition 2 Mean: 0.3919377237266763

phase 2|Avg. x coordinate
   bbox5: Welchs T-Test indicates significant difference by trigger condition, p = 4.170130087876906e-05
      Trigger Condition 1 Mean: 0.7293638370210188
      Trigger Condition 2 Mean: 0.2752628949147006

phase 2|Avg. x coordinate
   bbox1: Welchs T-Test indicates significant difference by trigger condition, p = 1.4770805728989706e-08
      Trigger Condition 1 Mean: 0.2904322437457625
      Trigger Condition 2 Mean: 0.7896514808336045

phase 2|Avg. x coordinate
   bbox2: Welchs T-Test indicates significant difference by trigger condition, p = 1.8160212017251937e-08
      Trigger Condition 1 Mean: 0.7455221202831153
      Trigger Condition 2 Mean: 0.26757720851561123

Cleaning: Checking for Box Variations

Conducts one-way ANOVAs using box as an independent variable and all floats as dependent variables. Uses a Bonferroni correction.



In [31]:

    
def betweensubjectANOVA (dependentvar, betweenfactor, suppress):
    try: 
        anovaresult = pp.anova(dv=dependentvar, between=betweenfactor, data=df, detailed=True, export_filename=None)
        pvalue = anovaresult.loc[anovaresult.Source==betweenfactor]['p-unc'].values[0]
        
        if pvalue >= .05/len(df.select_dtypes(include=['float64']).columns):
            if suppress == False:
                print(f'{dependentvar}')
                print(f'   NOT significant: One-way ANOVA conducted testing {betweenfactor} as significant predictor of {dependentvar}. P = {pvalue}')
                print()
            else:
                pass
        else:
            print(f'{dependentvar}')
            print(f'   SIGNIFICANT: One-way ANOVA conducted testing {betweenfactor} as significant predictor of {dependentvar}. P = {pvalue}')
            fig, ax = plt.subplots(1, 1, figsize=(6, 6))
            df.boxplot(column=dependentvar, by=betweenfactor, ax=ax)
            print()
            
    except:
        print(f'{dependentvar} analysis failed. Check descriptives.')

for col in df.select_dtypes(include=['float64']).columns:
    betweensubjectANOVA(col,'box', True)









    



phase3-tme to 1st trigger frm phs start
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger frm phs start. P = 3.806275018506156e-27

phase3-tme to 1st trigger_plus1 frm phs start
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger_plus1 frm phs start. P = 5.830196408402104e-27

phase3-tme to 1st trigger_plus2 frm phs start
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger_plus2 frm phs start. P = 7.313705616602799e-37







    



/home/bsheese/atles/notebooks/pingouinparametrics.py:899: RuntimeWarning: divide by zero encountered in double_scalars
  fval = msbetween / mserror






    



phase3-tme to 1st trigger_plus3 frm phs start
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger_plus3 frm phs start. P = 0.0

phase3-tme to 1st trigger_plus3 frm prev trigger
   SIGNIFICANT: One-way ANOVA conducted testing box as significant predictor of phase3-tme to 1st trigger_plus3 frm prev trigger. P = 0.0

phase3-tme to 1st trigger_plus4 frm phs start analysis failed. Check descriptives.






    



/home/bsheese/atles/notebooks/pingouinparametrics.py:899: RuntimeWarning: invalid value encountered in double_scalars
  fval = msbetween / mserror
/home/bsheese/atles/notebooks/pingouinparametrics.py:904: RuntimeWarning: invalid value encountered in double_scalars
  np2 = ssbetween / (ssbetween + sserror)






    



phase3-tme to 1st trigger_plus4 frm prev trigger analysis failed. Check descriptives.

Analysis - Preliminary Vizualizations - X and Y by Box



In [33]:

    
fig, ax = plt.subplots(1, 3, figsize=(15, 6), sharey=True)
 
df.boxplot(column=['phase 1|Avg. normed x coordinate', 'phase 2|Avg. normed x coordinate', 'phase 3|Avg. normed x coordinate'], by='box', ax=ax)









    



/share/apps/jhub/lib64/python3.6/site-packages/pandas/plotting/_core.py:2257: UserWarning: When passing multiple axes, sharex and sharey are ignored. These settings must be specified when creating axes
  return_type=return_type, **kwds)






    Out[33]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x2ac36f2c4a90>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x2ac36ebc8438>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x2ac36e88c470>],
      dtype=object)



In [21]:

    
fig, ax = plt.subplots(1, 3, figsize=(15, 6), sharey=True)
 
df.boxplot(column=['phase 1|Avg. y coordinate', 'phase 2|Avg. y coordinate', 'phase 3|Avg. y coordinate'], by='box', ax=ax)









    



/share/apps/jhub/lib64/python3.6/site-packages/pandas/plotting/_core.py:2257: UserWarning: When passing multiple axes, sharex and sharey are ignored. These settings must be specified when creating axes
  return_type=return_type, **kwds)






    Out[21]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x2ac36eff95c0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x2ac36eddb4a8>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x2ac36ee029b0>],
      dtype=object)

Analysis - Preliminary Vizualizations - X and Y by Phase



In [22]:

    
fig, ax = plt.subplots(1, 1, figsize=(15, 6)) 
df.boxplot(column=['phase 1|Avg. normed x coordinate', 'phase 2|Avg. normed x coordinate', 'phase 3|Avg. normed x coordinate'], ax=ax)









    Out[22]:





<matplotlib.axes._subplots.AxesSubplot at 0x2ac36f19b048>



In [34]:

    
fig, ax = plt.subplots(1, 1, figsize=(15, 6)) 
df.boxplot(column=['phase 1|Avg. y coordinate', 'phase 2|Avg. y coordinate', 'phase 3|Avg. y coordinate'], ax=ax)









    Out[34]:





<matplotlib.axes._subplots.AxesSubplot at 0x2ac36e3fe978>

Analysis - Preliminary Vizualizations - Heatmaps Per Phase



In [26]:

    
plotter = TrackPlotter(processor)
plotter.plot_heatmap(plot_type='per-phase')



In [28]:

    
# 'phase 1|Avg. normed x coordinate', 'phase 2|Avg. normed x coordinate', 'phase 3|Avg. normed x coordinate'





# aov = rm_anova(dv='DV', within='Time', data=df, correction='auto', remove_na=True, detailed=True, export_filename=None)
# print_table(aov)



In [29]:

    
phasenumcount = 1 
dependentvar = 'Avg. normed x coordinate'
dfanova = pd.DataFrame()

while phasenumcount <= phasenum:
    colname = f'phase {str(phasenumcount)}|{dependentvar}'
    dftemp = df[[colname]].copy()
    dftemp.columns.values[0] = dependentvar
    dftemp['phase'] = phasenumcount
    dfanova = dfanova.append(dftemp)
    
    phasenumcount +=1
   
pp.rm_anova(dv='Avg. normed x coordinate', within='phase', data=dfanova, correction='auto', remove_na=True, detailed=False, export_filename=None)









    Out[29]:







  
    
      
      Source
      ddof1
      ddof2
      F
      p-unc
      np2
    
  
  
    
      0
      phase
      2
      102
      73.964247
      1.412552e-20
      0.591883

	phases	controller	trigger	stimulus	box	file	track
20180219-162833-ACTEST2	[5, off, 5, on, 10, off]	controllers.FixedRatioController(response_step=1)	xpos < 0.50	[ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0...	bbox3	../data/tracks/bbox3/20180219-162833-ACTEST2-s...	20180219-162833-ACTEST2
20180303-140323-ACTEST2	[5, off, 5, on, 10, off]	controllers.FixedRatioController(response_step=1)	xpos < 0.50	[ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0...	bbox3	../data/tracks/bbox3/20180303-140323-ACTEST2-s...	20180303-140323-ACTEST2
20180213-162455-ACTEST2	[5, off, 5, on, 10, off]	controllers.FixedRatioController(response_step=1)	xpos > 0.50	[ac_freq_Hz=20, active_freq_Hz=1, duty_cycle=0...	bbox3	../data/tracks/bbox3/20180213-162455-ACTEST2-s...	20180213-162455-ACTEST2

	all\|#Datapoints	all\|#Freezes	all\|#Triggers	all\|#Valid	all\|%Valid datapoints	all\|Avg. normed x coordinate	all\|Avg. speed (?/sec)	all\|Avg. time per freeze (sec)	all\|Avg. time per trigger (sec)	all\|Avg. x coordinate	...	phase 3\|Avg. x speed (?/sec)	phase 3\|Avg. y coordinate	phase 3\|Avg. y speed (?/sec)	phase 3\|Freeze frequency (per min)	phase 3\|Total distance traveled (?)	phase 3\|Total time (sec)	phase 3\|Total time frozen (sec)	phase 3\|Total time triggered (sec)	phase 3\|Trigger frequency (per min)	phase 3\|Valid time (sec)
track
20180303-140323-ACTEST2	11989.0	4.0	162.0	11978.0	0.999082	0.453400	0.211504	39.397750	2.606121	0.546600	...	0.113900	0.240247	0.053649	0.200015	81.223056	599.9547	146.8919	202.1632	5.600423	599.6567
20180219-162833-ACTEST2	11992.0	39.0	77.0	11974.0	0.998499	0.336595	0.090913	5.821121	2.409019	0.663405	...	0.048981	0.137688	0.023124	2.400172	35.668226	599.9571	119.4581	90.5968	1.600114	599.7578
20180126-162646-ACTEST2	11992.0	13.0	68.0	11982.0	0.999166	0.356098	0.057592	25.261700	6.607431	0.356098	...	0.037321	0.451903	0.029200	0.900067	31.472323	599.9556	49.3343	300.1005	2.800207	599.9556

	phase1-tme to 1st trigger frm phs start	phase1-tme to 1st trigger_plus1 frm phs start	phase1-tme to 1st trigger_plus2 frm phs start	phase1-tme to 1st trigger_plus3 frm phs start	phase1-tme to 1st trigger_plus4 frm phs start	phase2-tme to 1st trigger frm phs start	phase2-tme to 1st trigger frm prev trigger	phase2-tme to 1st trigger_plus1 frm phs start	phase2-tme to 1st trigger_plus1 frm prev trigger	phase2-tme to 1st trigger_plus2 frm phs start	...	phase3-tme to 1st trigger frm phs start	phase3-tme to 1st trigger frm prev trigger	phase3-tme to 1st trigger_plus1 frm phs start	phase3-tme to 1st trigger_plus1 frm prev trigger	phase3-tme to 1st trigger_plus2 frm phs start	phase3-tme to 1st trigger_plus2 frm prev trigger	phase3-tme to 1st trigger_plus3 frm phs start	phase3-tme to 1st trigger_plus3 frm prev trigger	phase3-tme to 1st trigger_plus4 frm phs start	phase3-tme to 1st trigger_plus4 frm prev trigger
track
20180303-140323-ACTEST2	0.0048	0.0048	1.2063	1.2063	10.1036	206.9687	223.7824	207.3683	224.1820	207.8689	...	0.0399	0.1000	0.0399	0.1000	0.0399	0.1	NaN	NaN	NaN	NaN
20180219-162833-ACTEST2	0.0072	0.0072	1.6063	1.7044	1.8050	84.3770	370.1737	84.5793	370.3760	86.5792	...	0.0454	0.1041	0.0454	0.1041	NaN	NaN	NaN	NaN	NaN	NaN
20180126-162646-ACTEST2	0.0042	0.0042	0.7058	0.7058	0.8039	148.4718	438.3684	149.0718	438.9684	149.5718	...	0.0393	0.0999	0.0393	0.0999	NaN	NaN	NaN	NaN	NaN	NaN

	count	unique	top	freq
box
bbox1	15	2	xpos > 0.50	8
bbox2	13	2	xpos < 0.50	7
bbox3	8	2	xpos > 0.50	5
bbox4	6	1	xpos < 0.50	6
bbox5	10	2	xpos > 0.50	7