accelQC

author: Ieuan Clay
started: March 2015
last update: April 2015

accelQC is intended to provide basic QC functionality for checking raw data produced from 3D acelerometry devices

Functionality provided:

Data import: see "test_data_clean"
Overview of data
- wear time
- range and distributions

References

Set up session



In [1]:

    
# analytics
import numpy as np
import pandas as pd
import scipy.io as scio
import scipy.signal as sp
import seaborn as sb


# utils
import os



In [2]:

    
# flags
TEST = True
VERBOSE = True



In [3]:

    
# test data path
# todo: flexible input
if TEST :
    datapath = os.path.abspath("test_data/")
else :
    datapath = os.getcwd()
print(datapath)









    



C:\Users\Ieuan and Katharina\ieuan_work\delft\dataQC\test_data

Import data

Data Overview

Test data

Test 1 and Test 2 are events with good data.
Error 1 has the unexpected peaks every 1Hz in the spectrum.
Error 2 has a bug caused by 32 to 16 bit conversion.



In [44]:

    
if TEST :
    # read in test file, indexing on first two columns (src and input file)
    accel = pd.read_csv(os.path.join(datapath, 'test_data_full.tsv'), header=0, sep="\t", na_filter=False)
    accel.reset_index()
    accel.fillna('', inplace=True)
    accel.set_index(['src', 'file'], inplace=True)
else :
    #TODO
    print()
accel.info()









    



<class 'pandas.core.frame.DataFrame'>
MultiIndex: 15842362 entries, (error_1.mat, 4097.log) to (test_2.txt, nan)
Data columns (total 4 columns):
t    float64
x    float64
y    float64
z    float64
dtypes: float64(4)

Basic Overview

datasets
weartime
ranges



In [45]:

    
## reduce size of test dataset for simplicity
## take first 2 files from each source (if more than two exist)
if TEST :
    accel = accel.ix[[val for sublist in accel.groupby(level=[0]).groups.values() for val in pd.Series(sublist).unique()[:2]]]
    accel.info()









    



<class 'pandas.core.frame.DataFrame'>
MultiIndex: 957647 entries, (error_2.mat, accel_node000001.csv) to (test_2.txt, nan)
Data columns (total 4 columns):
t    957647 non-null float64
x    957647 non-null float64
y    957647 non-null float64
z    957647 non-null float64
dtypes: float64(4)



In [46]:

    
## summary of files present in each source
accel.reset_index(level="file").groupby(level="src").aggregate({'file' : lambda x : len(np.unique(x))})









    Out[46]:






  
    
      
      file
    
    
      src
      
    
  
  
    
      error_1.mat
       2
    
    
      error_2.mat
       2
    
    
      test_1.mat
       1
    
    
      test_2.txt
       1



In [47]:

    
## summary of datasets per input source and file
accel.groupby(level=['src','file']).aggregate({
                                                't' : lambda x: len(x), # number of rows
                                                'x' : lambda x: np.max(x) - np.min(x), # range
                                                'y' : lambda x: np.max(x) - np.min(x), # range
                                                'z' : lambda x: np.max(x) - np.min(x) # range
                                                
                                        })









    Out[47]:






  
    
      
      
      x
      y
      t
      z
    
    
      src
      file
      
      
      
      
    
  
  
    
      error_1.mat
      4097.log
        3514.000
        2627.000
       131876
        3479.000
    
    
      4098.log
        1633.000
        2059.000
       131876
        2804.000
    
    
      error_2.mat
      accel_node000001.csv
       65025.000
       65279.000
       233100
       65276.000
    
    
      accel_node000002.csv
       64791.000
       65279.000
       247260
       65275.000
    
    
      test_1.mat
      nan
           9.617
          26.659
        14015
          23.199
    
    
      test_2.txt
      nan
       26625.000
       29697.000
       199520
       36609.000

Spectral Density Estimation



In [ ]:

    
# http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.signal.welch.html#scipy.signal.welch

		x	y	t	z
src	file
error_1.mat	4097.log	3514.000	2627.000	131876	3479.000
error_1.mat	4098.log	1633.000	2059.000	131876	2804.000
error_2.mat	accel_node000001.csv	65025.000	65279.000	233100	65276.000
error_2.mat	accel_node000002.csv	64791.000	65279.000	247260	65275.000
test_1.mat	nan	9.617	26.659	14015	23.199
test_2.txt	nan	26625.000	29697.000	199520	36609.000