accelQC

  • author: Ieuan Clay
  • started: March 2015
  • last update: April 2015

accelQC is intended to provide basic QC functionality for checking raw data produced from 3D acelerometry devices

Functionality provided:

  • Data import: see "test_data_clean"
  • Overview of data
    • wear time
    • range and distributions

References

Set up session


In [1]:
# analytics
import numpy as np
import pandas as pd
import scipy.io as scio
import scipy.signal as sp
import seaborn as sb


# utils
import os

In [2]:
# flags
TEST = True
VERBOSE = True

In [3]:
# test data path
# todo: flexible input
if TEST :
    datapath = os.path.abspath("test_data/")
else :
    datapath = os.getcwd()
print(datapath)


C:\Users\Ieuan and Katharina\ieuan_work\delft\dataQC\test_data

Import data

Data Overview

Test data

  • Test 1 and Test 2 are events with good data.
  • Error 1 has the unexpected peaks every 1Hz in the spectrum.
  • Error 2 has a bug caused by 32 to 16 bit conversion.

In [44]:
if TEST :
    # read in test file, indexing on first two columns (src and input file)
    accel = pd.read_csv(os.path.join(datapath, 'test_data_full.tsv'), header=0, sep="\t", na_filter=False)
    accel.reset_index()
    accel.fillna('', inplace=True)
    accel.set_index(['src', 'file'], inplace=True)
else :
    #TODO
    print()
accel.info()


<class 'pandas.core.frame.DataFrame'>
MultiIndex: 15842362 entries, (error_1.mat, 4097.log) to (test_2.txt, nan)
Data columns (total 4 columns):
t    float64
x    float64
y    float64
z    float64
dtypes: float64(4)

Basic Overview

  • datasets
  • weartime
  • ranges

In [45]:
## reduce size of test dataset for simplicity
## take first 2 files from each source (if more than two exist)
if TEST :
    accel = accel.ix[[val for sublist in accel.groupby(level=[0]).groups.values() for val in pd.Series(sublist).unique()[:2]]]
    accel.info()


<class 'pandas.core.frame.DataFrame'>
MultiIndex: 957647 entries, (error_2.mat, accel_node000001.csv) to (test_2.txt, nan)
Data columns (total 4 columns):
t    957647 non-null float64
x    957647 non-null float64
y    957647 non-null float64
z    957647 non-null float64
dtypes: float64(4)

In [46]:
## summary of files present in each source
accel.reset_index(level="file").groupby(level="src").aggregate({'file' : lambda x : len(np.unique(x))})


Out[46]:
file
src
error_1.mat 2
error_2.mat 2
test_1.mat 1
test_2.txt 1

In [47]:
## summary of datasets per input source and file
accel.groupby(level=['src','file']).aggregate({
                                                't' : lambda x: len(x), # number of rows
                                                'x' : lambda x: np.max(x) - np.min(x), # range
                                                'y' : lambda x: np.max(x) - np.min(x), # range
                                                'z' : lambda x: np.max(x) - np.min(x) # range
                                                
                                        })


Out[47]:
x y t z
src file
error_1.mat 4097.log 3514.000 2627.000 131876 3479.000
4098.log 1633.000 2059.000 131876 2804.000
error_2.mat accel_node000001.csv 65025.000 65279.000 233100 65276.000
accel_node000002.csv 64791.000 65279.000 247260 65275.000
test_1.mat nan 9.617 26.659 14015 23.199
test_2.txt nan 26625.000 29697.000 199520 36609.000

Spectral Density Estimation


In [ ]:
# http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.signal.welch.html#scipy.signal.welch