Download, Parse and Interrogate Apple Health Export Data

The first part of this program is all about getting the Apple Health export and putting it into an analyzable format. At that point it can be analysed anywhere. The second part of this program is concerned with using SAS Scripting Wrapper for Analytics Transfer (SWAT) Python library to transfer the data to SAS Viya, and analyze it there. The SWAT package provides native python language access to the SAS Viya codebase.

https://github.com/sassoftware/python-swat

This file was created from a desire to get my hands on data collected by Apple Health, notably heart rate information collected by Apple Watch. For this to work, this file needs to be in a location accessible to Python code. A little bit of searching told me that iCloud file access is problematic and that there were already a number of ways of doing this with the Google API if the file was saved to Google Drive. I chose PyDrive. So for the end to end program to work with little user intervention, you will need to sign up for Google Drive, set up an application in the Google API and install Google Drive app to your iPhone.

This may sound involved, and it is not necessary if you simply email the export file to yourself and copy it to a filesystem that Python can see. If you choose to do that, all of the Google Drive portion can be removed. I like the Google Drive process though as it enables a minimal manual work scenario.

This version requires the user to grant Google access, requiring some additional clicks, but it is not too much. I think it is possible to automate this to run without user intervention as well using security files.

The first step to enabling this process is exporting the data from Apple Health. As of this writing, open Apple Health and click on your user icon or photo. Near the bottom of the next page in the app will be a button or link called Export Health Data. Clicking on this will generate a xml file, zipped up. THe next dialog will ask you where you want to save it. Options are to email, save to iCloud, message etc... Select Google Drive. Google Drive allows multiple files with the same name and this is accounted for by this program.


In [58]:
import xml.etree.ElementTree as et
import pandas as pd
import numpy as np
from datetime import *

import matplotlib.pyplot as plt
import re 
import os.path
import zipfile
import pytz

%matplotlib inline
plt.rcParams['figure.figsize'] = 16, 8

Authenticate with Google

This will open a browser to let you beging the process of authentication with an existing Google Drive account. This process will be separate from Python. For this to work, you will need to set up a Other Authentication OAuth credential at https://console.developers.google.com/apis/credentials, save the secret file in your root directory and a few other things that are detailed at https://pythonhosted.org/PyDrive/. The PyDrive instructions also show you how to set up your Google application. There are other methods for accessing the Google API from python, but this one seems pretty nice. The first time through the process, regular sign in and two factor authentication is required (if you require two factor auth) but after that it is just a process of telling Google that it is ok for your Google application to access Drive.


In [59]:
# Authenticate into Google Drive
from pydrive.auth import GoogleAuth

gauth = GoogleAuth()
gauth.LocalWebserverAuth()

Download the most recent Apple Health export file

Now that we are authenticated into Google Drive, use PyDrive to access the API and get to files stored.

Google Drive allows multiple files with the same name, but it indexes them with the ID to keep them separate. In this block, we make one pass of the file list where the file name is called export.zip, and save the row that corresponds with the most recent date. We will use that file id later to download the correct file that corresponds with the most recent date. Apple Health export names the file export.zip, and at the time this was written, there is no other option.


In [60]:
from pydrive.drive import GoogleDrive
drive = GoogleDrive(gauth)

file_list = drive.ListFile({'q': "'root' in parents and trashed=false"}).GetList()

# Step through the file list and find the most current export.zip file id, then use 
#      that later to download the file to the local machine.
# This may look a little old school, but these file lists will never be massive and 
#     it is readable and easy one pass way to get the most current file using the 
#     least (or low) amount of resouces
selection_dt = datetime.strptime("2000-01-01T01:01:01.001Z","%Y-%m-%dT%H:%M:%S.%fZ")
print("Matching Files")
for file1 in file_list: 
    if re.search("^export-*\d*.zip",file1['title']):
        dt = datetime.strptime(file1['createdDate'],"%Y-%m-%dT%H:%M:%S.%fZ")
        if dt > selection_dt:
            selection_id = file1['id']
            selection_dt = dt
        print('    title: %s, id: %s createDate: %s' % (file1['title'], file1['id'], file1['createdDate']))


Matching Files
    title: export.zip, id: 1MGKM6NFFF8uA8kN6uLFR2Pp610J1n2hd createDate: 2019-03-22T01:22:20.472Z
    title: export.zip, id: 19RBGmuEjup-os1oaK1dD2NCN3Nu9nHUe createDate: 2019-03-10T03:39:11.919Z
    title: export.zip, id: 1k64iSwoZs7iXb6Lzw2aJtXNpV_BwsZ_H createDate: 2018-12-02T11:22:28.198Z
    title: export.zip, id: 1rbdnuJd3lz_Y5KYYRBduBGHVJL7ifgZx createDate: 2018-10-31T00:37:52.023Z
    title: export.zip, id: 1m94M1dAFbrvuPcs8chp3_k7e6RbjfHWj createDate: 2018-10-23T21:05:49.963Z
    title: export.zip, id: 1Ivk4xbO4_Zy0JFMQPQAxklcq1QWvIizR createDate: 2018-08-31T08:31:01.717Z
    title: export.zip, id: 1Ms7_cm2bSUSP1f3aISTJjbtBFqRiQkdt createDate: 2018-08-15T18:54:08.171Z
    title: export-11.zip, id: 1z6aWf0Lg3G-QlvXJVucWJ-5P2EPFV66k createDate: 2018-06-21T12:44:25.865Z
    title: export-11.zip, id: 1A8dxwlEothYJ5psDOF2k-NVGBpvgnpUM createDate: 2018-05-25T11:13:25.764Z
    title: export-11.zip, id: 1FbkSLU4nU91RWI9vPOFlkyoVay7epnTj createDate: 2018-05-07T02:47:56.288Z
    title: export-11.zip, id: 1Crp_iuXp55Fa17kQI_-MNG0Gu7l695jp createDate: 2018-04-30T16:02:45.667Z
    title: export-11.zip, id: 1BdHBJadk9p7huBjklQmwO5kAy1dH8Jzp createDate: 2018-04-14T12:28:56.458Z
    title: export-11.zip, id: 15jt-e0VNOo651kuYLs8fMROlATOtOrOZ createDate: 2018-03-05T19:31:56.540Z
    title: export-11.zip, id: 1iiu-5jwOn8MQtt3qPmy4J5YygmDGXiVN createDate: 2018-02-12T16:36:35.212Z
    title: export-11.zip, id: 1y62Nr8BnTVnJmNe8X5AtDR_WF5wuSvIE createDate: 2018-01-11T14:24:37.039Z
    title: export-11.zip, id: 1Eppibk5C1GFyCg1IuhMUTQO3Ej8euqea createDate: 2018-01-07T14:34:50.516Z
    title: export-11.zip, id: 0B_EXRCwLorf3U2NXcDRoUHFqc3c createDate: 2017-10-28T10:43:30.483Z
    title: export-11.zip, id: 0B_EXRCwLorf3U2NNYVE4LWU0eFU createDate: 2017-10-17T11:19:40.949Z
    title: export-11.zip, id: 0B_EXRCwLorf3T2J1ZldvMWJZTFU createDate: 2017-10-12T21:43:44.879Z
    title: export-11.zip, id: 0B_EXRCwLorf3WWRKLTZoTkdsTTA createDate: 2017-10-05T11:07:04.347Z
    title: export-11.zip, id: 0B_EXRCwLorf3ZXBJeUJhcTk0NVU createDate: 2017-10-02T18:40:04.586Z
    title: export-11.zip, id: 0B_EXRCwLorf3ZG03VmJOdnhhWk0 createDate: 2017-09-30T12:17:57.351Z
    title: export-11.zip, id: 0B_EXRCwLorf3UXFaVU9wMThZU2c createDate: 2017-09-25T14:10:50.746Z
    title: export-11.zip, id: 0B_EXRCwLorf3T2NvZk5RTDZocWs createDate: 2017-09-17T01:00:30.590Z
    title: export-11.zip, id: 0B_EXRCwLorf3VVlxSEp4TDNBODg createDate: 2017-09-14T13:06:49.191Z
    title: export-11.zip, id: 0B_EXRCwLorf3RVdWa3ROSXhIaFE createDate: 2017-09-12T10:32:56.415Z
    title: export-11.zip, id: 0B_EXRCwLorf3SnB6dWtCTXREb2M createDate: 2017-09-07T09:52:16.661Z
    title: export-11.zip, id: 0B_EXRCwLorf3MVRzWTdJRTZJYnM createDate: 2017-09-04T12:06:42.662Z
    title: export-11.zip, id: 0B_EXRCwLorf3SEdIMkZ1WnJVT00 createDate: 2017-09-03T12:07:11.463Z
    title: export-11.zip, id: 0B_EXRCwLorf3dTNNRjI1OU5xeHM createDate: 2017-09-02T12:07:44.902Z
    title: export-11.zip, id: 0B_EXRCwLorf3ejZ6UGNFSzJaRVU createDate: 2017-08-31T13:14:04.688Z
    title: export-11.zip, id: 0B_EXRCwLorf3YnRWcjlZQmhxZUU createDate: 2017-08-30T12:11:14.831Z
    title: export-11.zip, id: 0B_EXRCwLorf3cFBNaExLSUZhR0E createDate: 2017-08-29T12:47:24.030Z
    title: export-11.zip, id: 0B_EXRCwLorf3WURhQVNkNHNVOEU createDate: 2017-08-28T23:46:06.160Z
    title: export-11.zip, id: 0B_EXRCwLorf3WlBMTGdOUG8wbnM createDate: 2017-08-27T01:29:07.004Z
    title: export-11.zip, id: 0B_EXRCwLorf3NlFUd2NjcGNZWEU createDate: 2017-08-25T11:34:18.979Z
    title: export-11.zip, id: 0B_EXRCwLorf3S043LXpxQ25sMGM createDate: 2017-08-23T20:06:53.446Z
    title: export-11.zip, id: 0B_EXRCwLorf3ZmhPX0RfbXdVWFk createDate: 2017-08-22T10:03:47.420Z
    title: export-11.zip, id: 0B_EXRCwLorf3MmZGb2RYSExjc00 createDate: 2017-08-21T11:51:34.276Z
    title: export-11.zip, id: 0B_EXRCwLorf3NWpGSHZ6b3Q2bFk createDate: 2017-08-20T12:12:36.435Z
    title: export-11.zip, id: 0B_EXRCwLorf3QXN6cFgxd0d2Nkk createDate: 2017-08-19T10:01:14.488Z
    title: export-11.zip, id: 0B_EXRCwLorf3c1lmUENHQXFabDg createDate: 2017-08-18T10:42:58.920Z
    title: export-11.zip, id: 0B_EXRCwLorf3NS1zZlRHQmxvUms createDate: 2017-08-17T11:24:53.148Z
    title: export-11.zip, id: 0B_EXRCwLorf3T0lGWkUtOTN2S1k createDate: 2017-08-17T02:06:53.928Z
    title: export-11.zip, id: 0B_EXRCwLorf3clVrVkVLbnViODA createDate: 2017-08-16T17:50:38.608Z
    title: export-11.zip, id: 0B_EXRCwLorf3WmIyWE5TaTZGSE0 createDate: 2017-08-15T10:30:49.089Z
    title: export-11.zip, id: 0B_EXRCwLorf3RnFNQmt0ei1jTWM createDate: 2017-08-14T10:02:45.153Z
    title: export-11.zip, id: 0B_EXRCwLorf3QmhwQVQzSVBjeVE createDate: 2017-08-13T11:45:47.984Z
    title: export-11.zip, id: 0B_EXRCwLorf3UG1SanhScVZ1TVE createDate: 2017-08-12T11:19:15.692Z
    title: export-11.zip, id: 0B_EXRCwLorf3emYyM2Y5RmFnZ00 createDate: 2017-08-11T10:21:51.534Z
    title: export-11.zip, id: 0B_EXRCwLorf3cmZlYWhXUVZMOHM createDate: 2017-08-10T11:31:15.158Z
    title: export-11.zip, id: 0B_EXRCwLorf3TGNuZC1JQWE2UWs createDate: 2017-08-07T18:24:40.990Z
    title: export-11.zip, id: 0B_EXRCwLorf3U0NLTXluUDlmUTA createDate: 2017-08-07T14:18:22.915Z
    title: export-11.zip, id: 0B_EXRCwLorf3TVF4Zkt5X0V5MVE createDate: 2017-08-06T11:45:50.691Z
    title: export-11.zip, id: 0B_EXRCwLorf3REFCSG9XQ04wWDA createDate: 2017-08-05T11:30:20.282Z
    title: export-11.zip, id: 0B_EXRCwLorf3UXNMaUFrcDh3NmM createDate: 2017-08-04T09:53:55.749Z
    title: export-11.zip, id: 0B_EXRCwLorf3YWRkaGRGU0t1UmM createDate: 2017-08-03T11:04:26.398Z
    title: export-11.zip, id: 0B_EXRCwLorf3b2xtcm9CaG1BUGs createDate: 2017-08-02T12:56:31.144Z
    title: export-11.zip, id: 0B_EXRCwLorf3ZURpVWY0ZGdod2s createDate: 2017-08-01T10:38:44.404Z
    title: export-11.zip, id: 0B_EXRCwLorf3NzlPc1ByZzBNM1E createDate: 2017-07-31T12:12:53.571Z
    title: export-11.zip, id: 0B_EXRCwLorf3aWlxdlRHR0ZQUUU createDate: 2017-07-30T11:16:51.081Z
    title: export-11.zip, id: 0B_EXRCwLorf3Vlc0OHhBa2JkYkk createDate: 2017-07-29T10:53:01.569Z
    title: export-11.zip, id: 0B_EXRCwLorf3bk5UdzU2M184aWc createDate: 2017-07-28T18:02:15.345Z
    title: export-11.zip, id: 0B_EXRCwLorf3Z3hERGd3SFZVaFk createDate: 2017-07-27T18:51:01.097Z
    title: export-11.zip, id: 0B_EXRCwLorf3V095MTdoWTVLbTg createDate: 2017-07-26T12:30:12.925Z
    title: export-11.zip, id: 0B_EXRCwLorf3eWdpSlY3VU1IZWs createDate: 2017-07-25T11:21:26.000Z
    title: export-11.zip, id: 0B_EXRCwLorf3MVlrQllTMXB2S1k createDate: 2017-07-24T09:52:26.366Z
    title: export-11.zip, id: 0B_EXRCwLorf3M0t0S3lwWVAzTTA createDate: 2017-07-23T12:16:24.149Z
    title: export-11.zip, id: 0B_EXRCwLorf3SDl2N0pHeUtnTm8 createDate: 2017-07-22T10:31:00.528Z
    title: export-11.zip, id: 0B_EXRCwLorf3MEk1Zno0UURCSDg createDate: 2017-07-21T11:42:59.965Z
    title: export-10.zip, id: 0B_EXRCwLorf3Y0p4MGpTSEoyQlk createDate: 2017-07-20T10:07:03.464Z
    title: export-10.zip, id: 0B_EXRCwLorf3WnJ0cFNtSDJFRGM createDate: 2017-07-19T20:53:43.324Z
    title: export-9.zip, id: 0B_EXRCwLorf3elh0clFKbmM2dlU createDate: 2017-07-19T11:39:25.974Z
    title: export-9.zip, id: 0B_EXRCwLorf3aC1NREhuUFJhbTA createDate: 2017-07-18T12:02:42.631Z
    title: export-8.zip, id: 0B_EXRCwLorf3aFppa0hla1BCTVk createDate: 2017-07-17T18:04:02.804Z
    title: export-7.zip, id: 0B_EXRCwLorf3bzNEdV84NmZxUFE createDate: 2017-07-16T11:11:39.658Z
    title: export-7.zip, id: 0B_EXRCwLorf3eThvMFI4Nk96Nmc createDate: 2017-07-15T21:19:23.211Z
    title: export-6.zip, id: 0B_EXRCwLorf3dmNVV052eDVLNWs createDate: 2017-07-14T20:27:54.409Z
    title: export-6.zip, id: 0B_EXRCwLorf3eTNicTZ4OXkxOUE createDate: 2017-07-14T11:34:00.858Z
    title: export-6.zip, id: 0B_EXRCwLorf3MVNzaklpQjNlRW8 createDate: 2017-07-13T11:17:55.912Z
    title: export-5.zip, id: 0B_EXRCwLorf3UDdOdS1FVDljMlE createDate: 2017-07-12T09:26:12.919Z
    title: export-4.zip, id: 0B_EXRCwLorf3eHVlX3FzN1BrMWc createDate: 2017-07-10T12:07:32.447Z
    title: export-3.zip, id: 0B_EXRCwLorf3WUhOcE1mZzhZTHc createDate: 2017-06-30T10:01:11.615Z
    title: export.zip, id: 0B_EXRCwLorf3aGNVdlNWTWRrdm8 createDate: 2017-06-26T10:57:23.957Z

In [61]:
if not os.path.exists('healthextract'):
    os.mkdir('healthextract')

Download the file from Google Drive

Ensure that the file downloaded is the latest file generated


In [62]:
for file1 in file_list:
        if file1['id'] == selection_id:
            print('Downloading this file: %s, id: %s createDate: %s' % (file1['title'], file1['id'], file1['createdDate']))
            file1.GetContentFile("healthextract/export.zip")


Downloading this file: export.zip, id: 1MGKM6NFFF8uA8kN6uLFR2Pp610J1n2hd createDate: 2019-03-22T01:22:20.472Z

Unzip the most current file to a holding directory


In [63]:
zip_ref = zipfile.ZipFile('healthextract/export.zip', 'r')
zip_ref.extractall('healthextract')
zip_ref.close()

Parse Apple Health Export document


In [64]:
path = "healthextract/apple_health_export/export.xml"
e = et.parse(path)
#this was from an older iPhone, to demonstrate how to join files
legacy = et.parse("healthextract/apple_health_legacy/export.xml")

In [65]:
#<<TODO: Automate this process

#legacyFilePath = "healthextract/apple_health_legacy/export.xml"
#if os.path.exists(legacyFilePath):
#    legacy = et.parse("healthextract/apple_health_legacy/export.xml")
#else:
#    os.mkdir('healthextract/apple_health_legacy')

List XML headers by element count


In [66]:
pd.Series([el.tag for el in e.iter()]).value_counts()


Out[66]:
Record                              1292664
MetadataEntry                        117414
Location                              13986
ActivitySummary                         952
InstantaneousBeatsPerMinute              71
Workout                                  21
WorkoutRoute                              9
WorkoutEvent                              8
Correlation                               7
HeartRateVariabilityMetadataList          1
HealthData                                1
Me                                        1
ExportDate                                1
dtype: int64

List types for "Record" Header


In [67]:
pd.Series([atype.get('type') for atype in e.findall('Record')]).value_counts()


Out[67]:
HKQuantityTypeIdentifierActiveEnergyBurned          668164
HKQuantityTypeIdentifierHeartRate                   223593
HKQuantityTypeIdentifierBasalEnergyBurned           172918
HKQuantityTypeIdentifierDistanceWalkingRunning      109151
HKQuantityTypeIdentifierStepCount                   107601
HKQuantityTypeIdentifierAppleExerciseTime             6848
HKQuantityTypeIdentifierFlightsClimbed                4341
HKQuantityTypeIdentifierBodyTemperature                 11
HKQuantityTypeIdentifierBloodPressureSystolic            7
HKQuantityTypeIdentifierBloodPressureDiastolic           7
HKQuantityTypeIdentifierHeight                           4
HKQuantityTypeIdentifierBodyMass                         3
HKQuantityTypeIdentifierVO2Max                           1
HKQuantityTypeIdentifierHeartRateVariabilitySDNN         1
dtype: int64

Extract Values to Data Frame

TODO: Abstraction of the next code block


In [68]:
import pytz

#Extract the heartrate values, and get a timestamp from the xml
# there is likely a more efficient way, though this is very fast
def txloc(xdate,fmt):
    eastern = pytz.timezone('US/Eastern')
    dte = xdate.astimezone(eastern)
    return datetime.strftime(dte,fmt)

def xmltodf(eltree, element,outvaluename):
    dt = []
    v = []
    for atype in eltree.findall('Record'):
        if atype.get('type') == element:
            dt.append(datetime.strptime(atype.get("startDate"),"%Y-%m-%d %H:%M:%S %z"))
            v.append(atype.get("value"))

    myd = pd.DataFrame({"Create":dt,outvaluename:v})
    colDict = {"Year":"%Y","Month":"%Y-%m", "Week":"%Y-%U","Day":"%d","Hour":"%H","Days":"%Y-%m-%d","Month-Day":"%m-%d"}
    for col, fmt in colDict.items():
        myd[col] = myd['Create'].dt.tz_convert('US/Eastern').dt.strftime(fmt)

    myd[outvaluename] = myd[outvaluename].astype(float).astype(int)
    print('Extracting ' + outvaluename + ', type: ' + element)
  
    return(myd)

HR_df = xmltodf(e,"HKQuantityTypeIdentifierHeartRate","HeartRate")


Extracting HeartRate, type: HKQuantityTypeIdentifierHeartRate

In [69]:
EX_df = xmltodf(e,"HKQuantityTypeIdentifierAppleExerciseTime","Extime")
EX_df.head()


Extracting Extime, type: HKQuantityTypeIdentifierAppleExerciseTime
Out[69]:
Create Extime Year Day Month-Day Month Week Days Hour
0 2016-07-01 15:30:07-04:00 1 2016 01 07-01 2016-07 2016-26 2016-07-01 15
1 2016-07-02 10:11:27-04:00 1 2016 02 07-02 2016-07 2016-26 2016-07-02 10
2 2016-07-02 10:12:27-04:00 1 2016 02 07-02 2016-07 2016-26 2016-07-02 10
3 2016-07-02 10:19:29-04:00 1 2016 02 07-02 2016-07 2016-26 2016-07-02 10
4 2016-07-02 12:01:45-04:00 1 2016 02 07-02 2016-07 2016-26 2016-07-02 12

In [70]:
#comment this cell out if no legacy exports.
# extract legacy data, create series for heartrate to join with newer data
#HR_df_leg = xmltodf(legacy,"HKQuantityTypeIdentifierHeartRate","HeartRate")
#HR_df = pd.concat([HR_df_leg,HR_df])

In [71]:
#import pytz
#eastern = pytz.timezone('US/Eastern')
#st = datetime.strptime('2017-08-12 23:45:00 -0400', "%Y-%m-%d %H:%M:%S %z")
#ed = datetime.strptime('2017-08-13 00:15:00 -0400', "%Y-%m-%d %H:%M:%S %z")
#HR_df['c2'] = HR_df['Create'].dt.tz_convert('US/Eastern').dt.strftime("%Y-%m-%d")

In [72]:
#HR_df[(HR_df['Create'] >= st) & (HR_df['Create'] <= ed)  ].head(10)

In [73]:
#reset plot - just for tinkering 
plt.rcParams['figure.figsize'] = 30, 8

In [74]:
HR_df.boxplot(by='Month',column="HeartRate", return_type='axes')
plt.grid(axis='x')
plt.title('All Months')
plt.ylabel('Heart Rate')
plt.ylim(40,140)


Out[74]:
(40, 140)

In [75]:
dx = HR_df[HR_df['Year']=='2019'].boxplot(by='Week',column="HeartRate", return_type='axes')
plt.title('All Weeks')
plt.ylabel('Heart Rate')
plt.xticks(rotation=90)
plt.grid(axis='x')
[plt.axvline(_x, linewidth=1, color='blue') for _x in [10,12]]
plt.ylim(40,140)


Out[75]:
(40, 140)

In [76]:
monthval = '2019-03'
#monthval1 = '2017-09'
#monthval2 = '2017-10'
#HR_df[(HR_df['Month']==monthval1) | (HR_df['Month']== monthval2)].boxplot(by='Month-Day',column="HeartRate", return_type='axes')
HR_df[HR_df['Month']==monthval].boxplot(by='Month-Day',column="HeartRate", return_type='axes')
plt.grid(axis='x') 
plt.rcParams['figure.figsize'] = 16, 8
plt.title('Daily for Month: '+ monthval)
plt.ylabel('Heart Rate')
plt.xticks(rotation=90)
plt.ylim(40,140)


Out[76]:
(40, 140)

In [53]:
HR_df[HR_df['Month']==monthval].boxplot(by='Hour',column="HeartRate")
plt.title('Hourly for Month: '+ monthval)
plt.ylabel('Heart Rate')
plt.grid(axis='x')
plt.ylim(40,140)


Out[53]:
(40, 140)

import calmap ts = pd.Series(HR_df['HeartRate'].values, index=HR_df['Days']) ts.index = pd.to_datetime(ts.index) tstot = ts.groupby(ts.index).median()

plt.rcParams['figure.figsize'] = 16, 8 import warnings warnings.simplefilter(action='ignore', category=FutureWarning) calmap.yearplot(data=tstot,year=2017)

Flag Chemotherapy Days for specific analysis

The next two cells provide the ability to introduce cycles that start on specific days and include this data in the datasets so that they can be overlaid in graphics. In the example below, there are three cycles of 21 days. The getDelta function returns the cycle number when tpp == 0 and the days since day 0 when tpp == 2. This allows the overlaying of the cycles, with the days since day 0 being overlaid.


In [21]:
# This isnt efficient yet, just a first swipe. It functions as intended.
def getDelta(res,ttp,cyclelength):
    mz = [x if (x >= 0) & (x < cyclelength) else 999 for x in res]
    if ttp == 0:
        return(mz.index(min(mz))+1)
    else:
        return(mz[mz.index(min(mz))])

#chemodays = np.array([date(2017,4,24),date(2017,5,16),date(2017,6,6),date(2017,8,14)])
chemodays = np.array([date(2018,1,26),date(2018,2,2),date(2018,2,9),date(2018,2,16),date(2018,2,26),date(2018,3,2),date(2018,3,19),date(2018,4,9),date(2018,5,1),date(2018,5,14),date(2018,6,18),date(2018,7,10),date(2018,8,6)])

HR_df = xmltodf(e,"HKQuantityTypeIdentifierHeartRate","HeartRate")
#I dont think this is efficient yet...
a = HR_df['Create'].apply(lambda x: [x.days for x in x.date()-chemodays])
HR_df['ChemoCycle'] = a.apply(lambda x: getDelta(x,0,21))
HR_df['ChemoDays'] = a.apply(lambda x: getDelta(x,1,21))


Extracting HeartRate, type: HKQuantityTypeIdentifierHeartRate

In [22]:
import seaborn as sns
plotx = HR_df[HR_df['ChemoDays']<=21]
plt.rcParams['figure.figsize'] = 24, 8
ax = sns.boxplot(x="ChemoDays", y="HeartRate", hue="ChemoCycle", data=plotx, palette="Set2",notch=1,whis=0,width=0.75,showfliers=False)
plt.ylim(65,130)
#the next statement puts the chemodays variable as a rowname, we need to fix that
plotx_med = plotx.groupby('ChemoDays').median()
#this puts chemodays back as a column in the frame. I need to see if there is a way to prevent the effect
plotx_med.index.name = 'ChemoDays'
plotx_med.reset_index(inplace=True)

snsplot = sns.pointplot(x='ChemoDays', y="HeartRate", data=plotx_med,color='Gray')


/Users/samuelcroker/Applications/anaconda/envs/Python3.5/lib/python3.5/site-packages/seaborn/categorical.py:478: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data[hue_mask])
/Users/samuelcroker/Applications/anaconda/envs/Python3.5/lib/python3.5/site-packages/seaborn/categorical.py:1424: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  stat_data = remove_na(group_data)

Boxplots Using Seaborn


In [23]:
import seaborn as sns
sns.set(style="ticks", palette="muted", color_codes=True)

sns.boxplot(x="Month", y="HeartRate", data=HR_df,whis=np.inf, color="c")
# Add in points to show each observation
snsplot = sns.stripplot(x="Month", y="HeartRate", data=HR_df,jitter=True, size=1, alpha=.15, color=".3", linewidth=0)


/Users/samuelcroker/Applications/anaconda/envs/Python3.5/lib/python3.5/site-packages/seaborn/categorical.py:450: FutureWarning: remove_na is deprecated and is a private function. Do not use.
  box_data = remove_na(group_data)

In [24]:
hr_only = HR_df[['Create','HeartRate']]
hr_only.tail()


Out[24]:
Create HeartRate
220457 2019-03-09 19:25:45-05:00 76
220458 2019-03-09 19:33:16-05:00 112
220459 2019-03-09 19:36:50-05:00 117
220460 2019-03-09 19:41:29-05:00 115
220461 2019-03-09 19:44:58-05:00 68

In [25]:
hr_only.to_csv('~/Downloads/stc_hr.csv')

In [ ]: